[tmql-wg] TOMA: comparison with TMQL

Tue Feb 20 18:05:49 EST 2007

On Mon, Feb 19, 2007 at 02:44:04PM +0100, Rani Pinchuk wrote:
> and instead there are two resources about Toma available:
>   - The Toma user manual: http://topiwriter.com/toma/TW_UM_TOMA_100.pdf

I have updated 

   http://topicmaps.bond.edu.au/mda/internet/semantic-web/topicmaps/Toma

> .................................................................. In 
> order to assess a language, one should understand it, and this usually 
> implies not only reading the language spec/description. For most 
> suggested languages, there are not yet tutorials, exercises, and the 
> sort, so it is not at all simple to understand the language.

True, but some language are very 'flat' where they have a syntax
construct for every use case. And some are 'deep' where there are
quite abstract principles at the core and it is the wit of the
developer to get something out of it. Think Haskell.

Here we are at the flatter side of things.

> Below I comment over different excerpts from the email of Prof. Barta:
                                                            ^^^^^^^^^^^

I just love the sound of it :-))

> >In this sense a
> >
> >provides (provider: laptop, 
> >          provided: adapter,
> >          receiver: electricity220) -->> tm://laptops/
> >
> >is easier on the eyes then the TOMA equivalent
> >
> >  insert 'adapter'
> >     into (provides)->provider, 'laptop'
> >     into (provides)->receiver, 'electricity220'
> >     into (provides)->provided
> >
> >And we can re(use) CTM syntax.
> 
> However, the above Toma INSERT could (and probably should) be indented 
> differently to be shown as:
> 
>  insert 'adapter'        into (provides)->provider,
>         'laptop'         into (provides)->receiver,
>         'electricity220' into (provides)->provided;
> 
> Which is much more readable.

Rightio, so the only argument which remains is the obvious overlap
with CTM.

> 
> >-- long chains
> >
> >In the TOMA Spec 2.5.2 there is a longish example how to chain several path
> >expressions while 'climbing' over associations.
> >
> >select $topic
> >   where $a(connect_to)->connected = 'little_finger'
> >   and $a(connect_to)->connected = $p1
> >   and $p1 not in ('little_finger')
> >   and $b(connect_to)->connected = $p2
> >   and $b(connect_to)->connected = $p1
> >   and $p2.bn != $p1.bn
> >   and $c(connect_to)->connected = $p
> >   and $c(connect_to)->connected = $p2
> >   and $p.bn != $p2.bn
> >   and $a != $b
> >   and $b != $c;
> >
> 
> :-) This example has been provided in order to explain why the following 
> syntax has been developed:
> 
> 		.role1<-association_id(association_type)->role2
> 
> So the above example can be written in Toma as:
> 
>   select $topic
>     where id(?little_finger?).$$<-(connect_to)->$$
>                              .$$<-(connect_to)->$$
>                              .$$<-(connect_to)->$$ = $topic;
> 
> or
> 
>   select $topic
>     where id(?little_finger?)
>       .connected<-(connect_to)->connected
>       .connected<-(connect_to)->connected
>       .connected<-(connect_to)->connected = $topic;
> 
> Which is at least as elegant as:
> >In TMQL I would do
> >
> >where
> >  $f = little_finger
> >& $f   <- connected [ ^ connect_to ] -> connected == $f'
> >& $f'  <- connected [ ^ connect_to ] -> connected == $f''
> >& $f'' <- connected [ ^ connect_to ] -> connected == $f'''

Elegance is on-par, I agree, but with the price of an _additional_
syntax construct.

> >What I found slightly difficult to grasp is that filtering is done quite
> >differently, depending on what to filter. To get one particular name from 
> >the
> >list, one would do this via
> 
> The different syntax points to different kind of filtering.

[...]

> >In TMQL filtering is always done with []. Fullstop.
> 
> So how do you tell the engine that you want a topic of type t1 and scope 
> s1 and later for a topic of type s1 and scope t1?

In TMQL to get all is-employed-by assocs you do (fully written out)

  %_ [ . >> classes == is-employed-by ]
  ^    ^ ^          ^
  take the map      |
       | |          |
       take each item
         |          |
         find all classes
                    |
                    and check whether there is an overlap with is-employed-by

To find then all in a particular scope, same thing, but different navigation
axis:

  %_ [ . >> classes == is-employed-by ]
     [ . >> scope   == my-database ]

Since these simple filters occur often, there is a shortcut doing the
same thing

  %_ [ ^ is-employed-at ] [ @ my-database ]

And this is also shorter to write

  // is-employed-at [ @ my-database ]

So the axes make all the difference. And they can be used anywhere,
not only inside a filter.

> >In TOMA, binding of variables happens in several places. 

[...]

> >All of this is not necessary in TMQL:
> >
> >TOMA:
> >
> >   select $topic.bn @ $scope, $scope.id
> >   where  $topic.bn = 'lung'
> >
> >TMQL:
> >   select $topic / name ( . , . @ )
> >   where  $topic / name == "lung"
> >
> >or
> >
> >   "lung" \ name / name ( . , . @ )
> >
> Could you give an example of the following in TMQL:
> 
>      select $topic.bn@$scope, $scope.bn@$scope2, $scope2.id
>       where $topic.bn = 'lung';

Same scheme:

  "lung" \ name / name ( . , . @ / name ( . , . @ ) )

Not sure what it does, though :-)

> >-- ad Identification
> >
> >In TOMA, topic identifiers (local identifiers) can be written as either
> >
> >   'foo'
> >
> >or
> >
> >   id ('foo')
> >
> >or
> >
> >   foo
> >
> Not exactly. It depends on the syntax. 'foo' is a string. If it is used 
> in the topic literal id(), it is taken as the id of a topic.
> "Naked literal" - foo - can be used inside typing brackets, following 
> the scope operator @ and as the roles in the association expression.

Uhm. I see.

> >Identification of topics via subject indicators seems to exist, but
> >what about locators? What exactly is
> >
> >  $topic.si ('http://....')
> 
> This is not valid Toma.
> si('http://...') is a subjectIdentity literal. It can start a path 
> expression chaining: it has output which is the topic which has the 
> subjectIdentity 'http://.../'. But it takes no input. So it cannot be 
> chained after $topic.
> Note that you have the path expression .si which takes as input a topic 
> and output its subjectIdentity.

What would I have to do to address a topic via its subject indicator
(identifier) and what for a topic with a subject locator?

In TMQL I write

   "http://www.topiwriter.com/toma/Toma.html" ~   # SI, long version
   http://www.topiwriter.com/toma/Toma.html       # SI, 'short' version

   "http://www.topiwriter.com/toma/Toma.html" =   # SL

> I am not sure how types and classes are dealt with in TMQL. In Toma,
> I chose not to mix them as it was not mixed by XTM: classes are
> defined by associations, and types in the topic itself.

TMQL uses the definition of types (= classes there) from TMDM, I hope :-)

So if you have XTMish

  <topic id="rani">
     <instanceOf>
        <topicRef ....person"/>
     </instanceOf>
  </topic>

you create an association of type type-instance between rani and person

  http://www.isotopicmaps.org/sam/sam-model/#sect-types

and XTM 2.0 says so, btw:

  http://www.isotopicmaps.org/sam/sam-xtm/#sect-proc-instanceOf

> The select you quote above, tries to take any descendant of the
> topic device, that it: any instance of device, any instance of a
> subclass of device, any instance of a subclass of a subclass of
> device etc.

Yes, this is implicit in TMQL _EVERYWHERE. I would claim that most
applications will need this behaviour.

> >-- Unexpected Things
> >
> >What I found unexpected, is that in TOMA, to find all associations of type
> >'part-whole', one has to write (Example 1 in 2.5.1 of 'TOMA Spec'):
> >
> >  select $a
> >   where $a.id = 'part-whole';
> 
> :-) This is an example of how NOT to find associations of type 
> 'part-whole'. The text that follows this example explains it:
> 
> In this example the engine doesn't know that the variable $a is an 
> association. It deals with it, as if it were an ordinary topic variable 
> and then looks among all topics which one has the id 'part-whole'.

OK, here I obviously missed the point. :-)

> In order to find all associations of type part-whole you should write in 
> Toma:
>    select distinct $a where exists $a(part-whole)->$$;

The 'distinct' is not really necessary here, right?

In TMQL, there is no distinction between assocs and topics (TMRMish
interpretation) in that they are all 'things'. So a

  // part-whole

gives you all things which are an instance of part-whole. Could be
topic, could be assoc, could be name, could be occurrence. If you
want only topics, then one has to say so.

> >To avoid convoluted path expression chains, a special semantics has been in
> >TOMA, so that two consecutive associations are always different (2.5.2).
> >
> >That seemed quite adhoc to me.
> 
> The reason for this is demonstrated in the following:
> 
>   select $topic
>     where id(?little_finger?).$$<-(connect_to)->$$
>                              .$$<-(connect_to)->$$ = $topic;
> 
> We do not want to get as a result for $topic the topic 'little_finger', 
> but without this rule, we would get it (because 'little_finger' is 
> connected to whatever which is connected back to 'little_finger').
> 
> I wonder how TMQL solves this.

TMQL does not try to do that. Not only, because writing maps ( =
massive effort) and then using connections like (is-connected-to) is
like flying with a helicopter to the bathroom. ;-)

I regard this 'one hop must be different from the next' as _VERY_
application specific and relative to how someone models his content.

> >-- XML and TM content
> >
> >TOMA obviously cannot generate TM or XML content. Now Rani will claim that 
> >this
> >is a template thing. But it is not, I think, and even if it were, for
> >performance reasons this should be part of the language.

> Rani indeed claims that this is a template thing :-)

Gotcha! :-))

> I have put as a focus to have Toma as simple and as small as
> possible.  It is TMQL, TMCL and TMML and nothing else.  I assumed
> that users that want to create applications using Toma or any other
> TMQL will use in addition other technologies (Java, Perl, Python
> etc.). Each of those technologies provide sets of techniques and
> methodologies to create XML content as well as any other
> content. Why to extend the language to include another such
> technique?  And what are the performance reasons here?

Let's assume we do it as you suggest. So what the application then
gets is a sequence of tuple values (strings, integer, topic,
whatever).

If I needed this in XML form then I would take these values, would
inject them into a template. That result has to be XML-parsed (not?)
__EVERYTIME__ I do that, for 500 rows, 500 times parsing of the
essentially the same structure. And then these fragments have to
collected into a document (that is cheap).

Even if these template engines allow pre-parsing (I know of none), you
still have the overhead of taking the values out of a TMQL processor
and injecting them into the template. And I suspect that many TMQL
processors will copy the data values before handing them over to the
application (never entrust an application engineer with the data
structures inside the DB ;-), so this will also carry the copying
cost.

But it gets worse, much morse actually, if the result you want to
generate is NOT very regular, say, like in a book. In your scenario
the book results would have to be flattened into a sequence first
(maybe containing a lot on NULL values), and then a program will have
to have a lot of

  if not NULL then _expand_template (....)

stuff.

And for TM content, by definition the content is quite irregular.

> >-- Output handling

> Toma returns indeed always textual representation of the output. 
> However, it is very simple to instead return the objects themselves. 
> This syntax was even available in earlier versions of Toma and was not 
> forgotten:
> 
>    SELECT ... AS XTM WHERE ...
> or
>    SELECT ... AS OBJECT WHERE ...

TMQL has this feature, or to be more exact, it contains all ingredients to
allow implementations to expose this feature:

>From http://topicmaps.bond.edu.au/new/tmql/tmql-user.dbk
(Application-Specific Atomification/Deatomification):

     Atomification not only applies to characteristic items, but to other
     items as well. In terms of the TMQL specification the result of
     atomifying a particular topic is a 'null-operation', i.e. leaves the
     item untouched. Implementations, though, are free to redefine this
     process and use their own atomification rules.

     Consider the query expression 

     select $p
       where $p isa person

     By default, the querying application would get topic items of type
     person as-is, i.e. as items according to TMDM. If the query would be
     modified to

     select $p >> atomify
       where $p isa person

     then a TMQL processor will be asked to trigger atomification for all
     these person topics before they are returned into the
     application. Still, by default, this will be a null-operation, so
     nothing changed by adding the atomification step.

     If our application, however, had defined various object classes, such
     as PERSON and if it would have configured the TMQL processor to
     populate object instances of the classes automatically, then it would
     get PERSON objects without any further ado. Here, how this could look
     like in Perl:

     package PERSON;

     # here are the methods and constructor

     1;

     my $q = new TM::QL ('select $p ... ');
     $q->register ( serializers => { 'person' => 'PERSON' });
     my $results = $q->eval (...); 

     The first line would create a query object. Before that query is
     evaluated, the application tells the processor that it wants to
     associate the class PERSON in the Perl program with the type person in
     the map. Sometimes these objects are referred to as business objects.

     Of course, this is all outside the TMQL specification; also how
     de-atomification would work in this case.

> The reason I didn't include this in Toma was that I didn't have a real 
> use case for it. After all, when I get a topic ID, I can use the TMAPI 
> to get the topic in a very similar fashion.

I think, there is a need for this.

> >-- Functions and Data Types
> >
> >TOMA's choice here is ad-hoc, in TMQL we have to think more lateral.
> Can you explain this please?

One of the TMQL issues is to decide which primitive data types we adopt
natively, so boolean?, integer?, decimal?, string (probably), ...

But the list is actually quite long, especially if you consider XML
Schema data types. But with all these types, there is also a list of
functions and operators.

I would like to see some degree of interoperability with XSD. Not
sure, how much of it can/should be taken.

> Indeed in Toma one should write the following:
> 
> 	$person.oc($type) and $type.type.super(*) = 'size'
> 
> to get the same effect.
> 

> I didn't realize the importance of such queries. I wonder if you
> would recommend to change the way Toma interpret such types - so
> types of basenames, occurrences, associations and roles to include
> by default any instance of any subclass of the written type (as size
> is interpreted in $person / size).

As I understand it, at least for TMQL there is agreement in the WG3 that
for occurrences, names, assoc-types and roles implicit subclassing should
be honored.

Again quoting

    http://topicmaps.bond.edu.au/new/tmql/tmql-user.dbk

    Quick Tour / Association Predicates:

    Association predicates actually have more implicit meaning than is
    obvious at first sight. If, for example, the map contained an
    association of type is-remastered-by which also connects an album with
    a producer and is-remastered-by is a subtype of is-produced-by, then
    also such associations would match the template.

    Honoring subclassing also applies to roles. Had we in our queried map
    an association of type is-remastered-by, but the role (type) for the
    album is not production, but the subclass remastering, such
    association would also match the association template.

\rho