[tmql-wg] TOMA: comparison with TMQL

Mon Feb 19 08:44:04 EST 2007

Dear all,

First of all, I appreciate the work Prof. Barta is doing - it is clear 
he tries to go very far harvesting for ideas for TMQL by examining other 
work.

In this email I would like to help in that by clarifying some issues and 
also stating my own opinions on some points.

Before I start, some comments:

1. The document "Toma Spec", to which Prof. Barta refers, is obsolete 
and instead there are two resources about Toma available:
   - The Toma user manual: http://topiwriter.com/toma/TW_UM_TOMA_100.pdf
   - The paper about Toma from the proceedings of TMRA'06.
However, in order to be able to refer to some of the comments of Prof. 
Barta, the obsolete "Toma Spec" is still available online: 
http://topiwriter.com/toma/Toma.html
Note that this last document is obsolete not due to changes in Toma but 
due to changes and many fixes in the document itself (that became the 
Toma user manual).

2. In the past, in the one session of the TMQL committee I took part, 
and in some email exchange before and after, I realized that it is 
almost impossible to assess new languages other people put forward. In 
order to assess a language, one should understand it, and this usually 
implies not only reading the language spec/description. For most 
suggested languages, there are not yet tutorials, exercises, and the 
sort, so it is not at all simple to understand the language.
Therefore, I find it a good idea to cooperate with the language authors 
  not only in getting the different documentation, but also in answering 
the different tough questions about the language.

Below I comment over different excerpts from the email of Prof. Barta:

> In this sense a
> 
> provides (provider: laptop, 
>           provided: adapter,
>           receiver: electricity220) -->> tm://laptops/
> 
> is easier on the eyes then the TOMA equivalent
> 
>   insert 'adapter'
>      into (provides)->provider, 'laptop'
>      into (provides)->receiver, 'electricity220'
>      into (provides)->provided
> 
> And we can re(use) CTM syntax.

However, the above Toma INSERT could (and probably should) be indented 
differently to be shown as:

  insert 'adapter'        into (provides)->provider,
         'laptop'         into (provides)->receiver,
         'electricity220' into (provides)->provided;

Which is much more readable.

> -- long chains
> 
> In the TOMA Spec 2.5.2 there is a longish example how to chain several path
> expressions while 'climbing' over associations.
> 
> select $topic
>    where $a(connect_to)->connected = 'little_finger'
>    and $a(connect_to)->connected = $p1
>    and $p1 not in ('little_finger')
>    and $b(connect_to)->connected = $p2
>    and $b(connect_to)->connected = $p1
>    and $p2.bn != $p1.bn
>    and $c(connect_to)->connected = $p
>    and $c(connect_to)->connected = $p2
>    and $p.bn != $p2.bn
>    and $a != $b
>    and $b != $c;
> 

:-) This example has been provided in order to explain why the following 
syntax has been developed:

		.role1<-association_id(association_type)->role2

So the above example can be written in Toma as:

   select $topic
     where id(’little_finger’).$$<-(connect_to)->$$
                              .$$<-(connect_to)->$$
                              .$$<-(connect_to)->$$ = $topic;

or

   select $topic
     where id(’little_finger’)
       .connected<-(connect_to)->connected
       .connected<-(connect_to)->connected
       .connected<-(connect_to)->connected = $topic;

Which is at least as elegant as:
> In TMQL I would do
> 
> where
>   $f = little_finger
> & $f   <- connected [ ^ connect_to ] -> connected == $f'
> & $f'  <- connected [ ^ connect_to ] -> connected == $f''
> & $f'' <- connected [ ^ connect_to ] -> connected == $f'''
> 

> --
> 
> What I found slightly difficult to grasp is that filtering is done quite
> differently, depending on what to filter. To get one particular name from the
> list, one would do this via

The different syntax points to different kind of filtering.

> 
>   $topic.bn['central processing unit']
Here we are looking for a basename with the value 'centeral processing 
unit'.

> 
>   $topic.bn (abbreviation)
Here we are looking for a basename which is of type abbreviation.
> 
> Filtering of roles according to the type is using [] again:
> 
>    ...-> [role-type]

Actually:

      ...->role-type

(so without the square brackets) Here we are looking for a player 
playing a role of that type.

>   ....$$<- (assoc-type) ->
Here we are looking for a player playing an association of type assoc-type.

>    $topic.bn @ english
Here we are looking for a basename with scope english.

> 
> In TMQL filtering is always done with []. Fullstop.

So how do you tell the engine that you want a topic of type t1 and scope 
s1 and later for a topic of type s1 and scope t1?

> --
> 
> In TOMA, binding of variables happens in several places. I would be happy to see
> it in the WHERE clause only, but TOMA seems to allow it also in the SELECT
> clause:
> 
>   select $topic.bn [ $bn ]
>   where
>          $topic.id = 'foo' and $bn ~ '^a';
> 
> It is not effectively possible to look at the WHERE clause only and then
> evaluate the SELECT expression(s). I could not figure out whether in SELECT
> clauses path expressions in their full beauty can be used. But if so, how is
> SELECT then different from WHERE?
> 
> It would be interesting to see a formal definition.
> 
> All of this is not necessary in TMQL:
> 
> TOMA:
> 
>    select $topic.bn @ $scope, $scope.id
>    where  $topic.bn = 'lung'
> 
> TMQL:
>    select $topic / name ( . , . @ )
>    where  $topic / name == "lung"
> 
> or
> 
>    "lung" \ name / name ( . , . @ )
> 
Could you give an example of the following in TMQL:

      select $topic.bn@$scope, $scope.bn@$scope2, $scope2.id
       where $topic.bn = 'lung';

> -- ad Identification
> 
> In TOMA, topic identifiers (local identifiers) can be written as either
> 
>    'foo'
> 
> or
> 
>    id ('foo')
> 
> or
> 
>    foo
> 
Not exactly. It depends on the syntax. 'foo' is a string. If it is used 
in the topic literal id(), it is taken as the id of a topic.
"Naked literal" - foo - can be used inside typing brackets, following 
the scope operator @ and as the roles in the association expression.

> Identification of topics can also be via their name, but then the path
> expression has to be read from right to left
> 
>   $topic.bn('Processor')

This is not valid Toma. bn('Processor') is a literal. It can start path 
expression: it has output which is the topic which has basename 
'Processor'. But it takes no input. So it cannot be chained after anything.

On the other hand, .bn is a path expression and can be followed by a 
typing brackets which contain naked literal:

     $topic.bn(abbreviation)

can be written in Toma and means the basename of type abbreviation.

> 
> Identification of topics via subject indicators seems to exist, but
> what about locators? What exactly is
> 
>   $topic.si ('http://....')

This is not valid Toma.
si('http://...') is a subjectIdentity literal. It can start a path 
expression chaining: it has output which is the topic which has the 
subjectIdentity 'http://.../'. But it takes no input. So it cannot be 
chained after $topic.
Note that you have the path expression .si which takes as input a topic 
and output its subjectIdentity.

> 
> -- Conciseness
> 
> Some queries seem to go overboard with the syntax:
> 
>   select $topic.oc(mass).sc['textual']
>    where $topic.type.super(*) = 'device';
> 
> In TMQL this would be
> 
>   // device / mass [ @ textual ]
> 
> or
> 
>   select $device / mass [ @ textual ]
>    where $device isa device
>

I am not sure how types and classes are dealt with in TMQL. In Toma, I 
chose not to mix them as it was not mixed by XTM: classes are defined by 
associations, and types in the topic itself.

The select you quote above, tries to take any descendant of the topic 
device, that it: any instance of device, any instance of a subclass of 
device, any instance of a subclass of a subclass of device etc.
In Toma you could also ask to see any subclass of a topic from any level 
  or any range of levels and the same with types (although usually it is 
recommended to have only one level of typing, that is: an instance 
should not have instances).

> -- Unexpected Things
> 
> What I found unexpected, is that in TOMA, to find all associations of type
> 'part-whole', one has to write (Example 1 in 2.5.1 of 'TOMA Spec'):
> 
>   select $a
>    where $a.id = 'part-whole';

:-) This is an example of how NOT to find associations of type 
'part-whole'. The text that follows this example explains it:

In this example the engine doesn't know that the variable $a is an 
association. It deals with it, as if it were an ordinary topic variable 
and then looks among all topics which one has the id 'part-whole'.

In order to find all associations of type part-whole you should write in 
Toma:
    select distinct $a where exists $a(part-whole)->$$;

> --
> 
> To avoid convoluted path expression chains, a special semantics has been in
> TOMA, so that two consecutive associations are always different (2.5.2).
> 
> That seemed quite adhoc to me.

The reason for this is demonstrated in the following:

   select $topic
     where id(’little_finger’).$$<-(connect_to)->$$
                              .$$<-(connect_to)->$$ = $topic;

We do not want to get as a result for $topic the topic 'little_finger', 
but without this rule, we would get it (because 'little_finger' is 
connected to whatever which is connected back to 'little_finger').

I wonder how TMQL solves this.

> -- XML and TM content
> 
> TOMA obviously cannot generate TM or XML content. Now Rani will claim that this
> is a template thing. But it is not, I think, and even if it were, for
> performance reasons this should be part of the language.

Rani indeed claims that this is a template thing :-)
I have put as a focus to have Toma as simple and as small as possible. 
It is TMQL, TMCL and TMML and nothing else.
I assumed that users that want to create applications using Toma or any 
other TMQL will use in addition other technologies (Java, Perl, Python 
etc.). Each of those technologies provide sets of techniques and 
methodologies to create XML content as well as any other content. Why to 
extend the language to include another such technique?
And what are the performance reasons here?
> 
> -- Output handling
> 
> The query result is always the 'textual representation' of the output, right?
> 
> What if this is a list? So it is a list of textual representations, not their
> concatenation, right? How to control that you do _NOT_ want the string, but the
> characteristic itself?
> 
> In TMQL, the default is 'atomification', i.e. the conversion of characteristics
> into the value they contain. In this process, the scope and the type are
> lost. This, so I assume, is what most users are asking for.
> 
> TMQL:
> 
>   robert / name    # auto-atomification at the end, one gets a string
> 
>   robert / name [ @ nick ] # ditto, but after filtering for this scope
> 
>   robert >> characteristics name # get the whole characteristics item
> 
Toma returns indeed always textual representation of the output. 
However, it is very simple to instead return the objects themselves. 
This syntax was even available in earlier versions of Toma and was not 
forgotten:

    SELECT ... AS XTM WHERE ...
or
    SELECT ... AS OBJECT WHERE ...

It is even not too difficult to implement it, the moment we understand 
how we provide an object. For example, TopiEngine support an 
implementation of TMAPI like API in C++ where C++ objects are available 
for the different Topic Maps objects.

The reason I didn't include this in Toma was that I didn't have a real 
use case for it. After all, when I get a topic ID, I can use the TMAPI 
to get the topic in a very similar fashion.

> -- Functions and Data Types
> 
> TOMA's choice here is ad-hoc, in TMQL we have to think more lateral.
Can you explain this please?

> 
> -- Ordering
> 
> In TOMA one has to refer to a 'column number', so this is a bit brittle, when
> you fiddle around with the SELECT clause. Or maybe the example there is just
> misleading.
> 
> In TMQL ordering can be done via a stand-alone path expression, i.e. one can
> order according to something which is NOT even in the SELECT clause
> 
> select $p / name
>   where $p isa person
>   order by $p / age desc

This is a design decision I have made in order to make things a bit 
simpler for the user (I hope). When you sort by anything that is not in 
the SELECT clause, you might get into confusing situations.
For example, when trying to sort the names of topics by the topic types: 
there are more names then types, so how to sort the names exactly?
And if you sort by basenames and not all topics have basenames? how to 
sort the topic ids?

When thinking about it I have realized that from the implementation 
point of view the two choices are very similar. If you wish - sorting by 
something that is not in the SELECT clause meaning to add a hidden 
column to the SELECT clause and sort by that hidden column.
So instead of doing that, I decided that the user will add this column 
explicitly instead of the engine doing it implicitly.
This way the confusion is avoided: when you sort by a column you see, 
you probably understand better how things are sorted.

> 
> -- Implicit subclassing
> 
> In TMQL the default is that subclass hierarchies are honored. So when you do a
> 
>    $person / size
> 
> then you get all occurrences of this type and all its subtypes, so also
> shoesize, hatsize, whatever is available in the map and it is made explicit that
> shoesize iko size and hatsize iko size, etc.
> 
> TOMA seems to have a 'strict' interpretation. The problem with that is that it
> can be cumbersome to write queries which are robust against subtle changes in
> the type structure. So the 'immediate type' or 'immediate subclass' may change;
> if we allow programmers to rely too much on it, our applications are all dancing
> on thin ice.
Indeed in Toma one should write the following:

	$person.oc($type) and $type.type.super(*) = 'size'

to get the same effect.

I didn't realize the importance of such queries. I wonder if you would 
recommend to change the way Toma interpret such types - so types of 
basenames, occurrences, associations and roles to include by default any 
instance of any subclass of the written type (as size is interpreted in 
$person / size).

> 
> \rho
> _______________________________________________
> tmql-wg mailing list
> tmql-wg at isotopicmaps.org
> http://www.isotopicmaps.org/mailman/listinfo/tmql-wg
> 

Kind regards,

Rani
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rp.vcf
Type: text/x-vcard
Size: 317 bytes
Desc: not available
Url : http://www.petesbox.net/pipermail/tmql-wg/attachments/20070219/74e2f766/rp-0001.vcf