[tmql-wg] Comments against TMQL draft 2006-02-22

Wed Mar 7 05:19:58 EST 2007

On Tue, Mar 06, 2007 at 06:19:42PM +0100, Lars Heuer wrote:
> - TMQL is IMO case-sensitive, but it is never stated somewhere
>   It should be stated somewhere if the language is case-sensitve 
>   or not.

The only place which touches it is the grammar, right. Repeating it in
the prose certainly does not hurt.

> - The TMDM namespace seems to be the default namespace. Otherwise
>         jack / name 
>   cannot work (since "name" is not a keyword). 
>   But it is never stated that the TMDM namespace is the default one
>   and does not need the "tm" prefix.

TMQL does not use a 'default namespace', otherwise it would have said
so, but it does not. The reason why

   jack / name

should work is that 'name' is a _valid_ item reference of a topic in
every map(!) for the TMDM concept

   http://psi.topicmaps.org/iso13250/glossary/topic-name

That es reasonable given that I also have the expectation that
relationships with occurrences and names are honored when
deserializing CTM streams:

   lheuer isa person
   ! irc-nick: lheuer
   ! fullname: Lars Heuer

Here I would argue that implicitely

   irc-nick iko name
   fullname iko name

is added by a CTM processor to the map. Otherwise this whole name
typing thing does not make sense, IMHO. Same with occurrences:

   lheuer isa person
   homepage: http://.....

==> implicitely

   homepage iko occurrence

Now this implies that name and occurrence must be in the map. Hence I
can use an item reference for them.

Is a bit cowboy-like, I admit, but I flagged this to CTM. [ Yipppeee. ]

> - The draft is (intentionally) silent about the data which is
>   returned if a query generates topics, not their characteristics.
>   Do the editors consider to add a section which clarifies this?

The standard at the moment is very silent _always_ when it comes to
returning topic map items. It only says something about atomification
(conversion between characteristics and the value in it). So
implementations can choose to return TMRMish structures, or TMDMish or
even application-specific business objects.

The overall approach is "what I do not mention, I do not constrain".

> 3.2 Ontological Commitments
> ---------------------------
> 
> - Prefix "tm"
>   The prefix "tm" cannot be bound to 
>   "http://psi.topicmaps.org/iso13250/glossary/"
>   but must be bound to "http://psi.topicmaps.org/iso13250/model/"
>   
>   Reason:
>   The "/glossary" namespace just contains glossary terms but not 
>   TM-datamodel terms

I think you are right, but I cannot find a place where

   http://psi.topicmaps.org/iso13250/model/

is defined as namespace. Could be just me being tired.

>   IMO the prefix should be renamed to "dm" since the important part 
>   is "Data Model", not "Topic Maps".

Hmmm, no strong feelings here, except that dm here means the logo of
Drogeriemarkt (Drug Market) in Austria. :-)

> - Prefix "xsd"
>   IMO the prefix "xsd" should be "xs" since nearly all XML standards
>   use "xs"
> 
> - Prefix "tmql"
>   Why is it not just "ql"?

No particular reason, except maybe that TMQL is already a mnemonic. ql
is not.

> 4.2 Atoms
> ---------
> - "This International Standard recognizes natively a small set of 
>   primitive data types..."
>   I.e. "qname" is neither a native nor an primitive datatype in 
>   "XML Schema Part 2: Datatypes Second Edition"-sense.

Probably I do not understand something properly, because QNames are an
XS type:

    http://www.w3.org/TR/xmlschema-2/datatypes.html#QName

I'm a bit flaky on this one, so help/education is welcome.

> - [6] date ::= xsd:dateTime
>   "date" should be "xsd:date", "xsd:dateTime" is a different 
>   datatype.

Yes, we will have to revisit this when the data types are settled.

> - [7] iri
>   IRIs must be encapsulated by '"'; why?
>   In Leipzig (10/2006) the committee decided that IRIs are detectable
>   and IRIs do not have to be quoted for CTM. Why are IRIs are not 
>   detectable by a TMQL processor?

There is another issue here: In

   http://en.wikipedia.org/wiki/John_Winston_Lennon

and

  "http://en.wikipedia.org/wiki/John_Winston_Lennon"

the first one is an IRI used as subject identifier. This goes done
rule [8] and the second one is an atomic value of type anyURI (or
whatever that type is). That is rule [7].

In both cases the IRI detection is done, so that

  "http://en.wikipedia.org/wiki/John_Winston_Lennon"

is NOT a string, but an IRI.

> - "For all types, a IRI can..."
>   "For all types, a_n_ IRI can"

Fixed.

> - EXAMPLE
>   "http://example.org/something"  (xsd:anyURI)
>   It is unclear why the provided string is interpreted as
>   xsd:anyURI.
>   IMO the provided string is ambiguous because it could also be
>   interpreted as literal with datatype xsd:string

It is not a string because the syntactic rules for IRIs are tighter
than those for strings. For integers/decimal you have the same
situation: also here

    3

is not interpreted as decimal, but integer because that has a more
limited syntax and is preferred over decimal.

>   2005-10-16T10:29  (xsd:date)
>   The provided value is neither a xsd:date nor a xsd:dateTime
>   
>   Valid date: 2005-10-16
>   Valid dateTime: 2005-10-16T10:29:00

Yeah, that is flaky and will have to be fixed when we have the data
types in places. I made another note.

> - "If a prefix is used to form a QName, then no blanks between the 
>   prefix and the following identifiers are allowed."
>   
>   IMO this rule should be moved to a "QName" rule

You mean that he rules

    qname -> prefix identifier
    ....

should be seperated from the atoms a bit and the explanatory text be
put underneath?

> 4.4 Navigation
> --------------
> - [15] axis
>   Several axis use a terminology (i.e. "classes") which is not 
>   aligned to TMDM.
>   Consider to align the axes names with the TMDM-terminology

Definitely. In fact, I would love to see this whole axes business move
into TMDM at some point. It has nothing to do with TMQL, not?

So this means that 'class' should be replaced with 'type'. And
superclass with supertype...

> - [h] indicators
>   I wonder if we need two syntaxes for subject identifiers. The user
>   may use either a plain URI or an URI with the "~" suffix. Isn't a
>   plain URI enough? Why do we need the "~"?

I have left it in there just for completeness and symmetry, so that
there is a canonical solution for both and the navigation is made
explicit. I regard the use

    http://something/

as shortcut. I like to have canonical ways, especially if people start
to create TMQL statements dynamically. People do strange things.

> - [i] reifier
>   Isn't one "~" enough? So, we'll get "~>" And if we eleminte the "~"
>   from [h] we can write reifiers even more compact "~". But I don't 
>   mind if the "~>" looks better :)

The ~~> is really for the looks. No, seriously. :-) As long as the
'zooming' is made apparent we should be fine.

> 4.7 Composite Content
> ---------------------
> - [19] content
>   Reasons why the operators "++" and "--" are not "+" and "-"?
>   
>   IMO it is a bit confusing that the equality operator "==" is 
>   mentioned in this section together with the concatenation and 
>   subtraction operators.

But == is NOT just 'comparison'. For instance,

   $p / shoesize == $p / headsize

will compare all pairings and will return 'true' if there EXISTS one
combination of shoesize and headsize being equal. So we are NOT
comparing values, but an overlap of tuples. Hence the use of == to
make that clear. Similar for ++ and --, they are also working on tuple
sequences.

They are terminals, though, so as long as it does not conflict with
something else, they might be changed. But for == there might be the
problem to make it = as there is already a = for the subject locator
prefix

   "http://whatever" =

And I really, really like that one. :-)

> 4.10 Topic Map Content
> ----------------------
> Do we need this section at all? How is CTM different from any other
> textual content?

In that, that this is NOT text. If I write

for $p in // person
return
    """
       $p isa terrorist
    """

TMQL does NOT return text strings which are then parsed by the
application into a map. TMQL creates already these TM fragments and
merges them together as it goes.

This is MUCH, MUCH fast, of course.

[ Also XML content is NOT returned as text, but as XML fragments
combined.

> 4.13 Boolean Expressions
> ------------------------
> - [34] boolean-expression
>   Operators "&" and "|": In a previous context the "OR" operator was
>   "||". Which one is correct?
>   
>   IMO && and || look better and are more common in other languages, 
>   but I don't mind. :)

In almost all languages there is a huge difference between & and &&
and | and ||. So also in TMQL:

  & symmetric, non-short circuit AND
  && does not exist in TMQL
  | symmetric, non-short circuit OR
  || asymetric, short-circuit OR (can be defined via if-then-else)

This is the same in Java, which got it from C++ which got it from C.....

I would not advise to switch & to short-circuit because this can
dramatically undermine automatic optimization.

> 5 Query Contexts
> ----------------
> "%__" (current environment map) and "%_" (current context map) are not
> very readable resp. distinguishable.

True, true, true. Whatever we end up with, I would like to retain a

    %_whatever_name

so that the leading %_ indicates to the user that this is a special
variable.

--

Super feedback, thx.

\rho