# [tmql-wg] TOMA: comparison with TMQL

Rani Pinchuk rp at spaceapplications.com
Mon Feb 19 08:44:04 EST 2007

Dear all,

First of all, I appreciate the work Prof. Barta is doing - it is clear
he tries to go very far harvesting for ideas for TMQL by examining other
work.

In this email I would like to help in that by clarifying some issues and
also stating my own opinions on some points.

1. The document "Toma Spec", to which Prof. Barta refers, is obsolete
- The Toma user manual: http://topiwriter.com/toma/TW_UM_TOMA_100.pdf
- The paper about Toma from the proceedings of TMRA'06.
However, in order to be able to refer to some of the comments of Prof.
Barta, the obsolete "Toma Spec" is still available online:
http://topiwriter.com/toma/Toma.html
Note that this last document is obsolete not due to changes in Toma but
due to changes and many fixes in the document itself (that became the
Toma user manual).

2. In the past, in the one session of the TMQL committee I took part,
and in some email exchange before and after, I realized that it is
almost impossible to assess new languages other people put forward. In
order to assess a language, one should understand it, and this usually
implies not only reading the language spec/description. For most
suggested languages, there are not yet tutorials, exercises, and the
sort, so it is not at all simple to understand the language.
Therefore, I find it a good idea to cooperate with the language authors
not only in getting the different documentation, but also in answering
the different tough questions about the language.

Below I comment over different excerpts from the email of Prof. Barta:

> In this sense a
>
> provides (provider: laptop,
>
> is easier on the eyes then the TOMA equivalent
>
>      into (provides)->provider, 'laptop'
>      into (provides)->provided
>
> And we can re(use) CTM syntax.

However, the above Toma INSERT could (and probably should) be indented
differently to be shown as:

'electricity220' into (provides)->provided;

> -- long chains
>
> In the TOMA Spec 2.5.2 there is a longish example how to chain several path
> expressions while 'climbing' over associations.
>
> select $topic > where$a(connect_to)->connected = 'little_finger'
>    and $a(connect_to)->connected =$p1
>    and $p1 not in ('little_finger') > and$b(connect_to)->connected = $p2 > and$b(connect_to)->connected = $p1 > and$p2.bn != $p1.bn > and$c(connect_to)->connected = $p > and$c(connect_to)->connected = $p2 > and$p.bn != $p2.bn > and$a != $b > and$b != $c; > :-) This example has been provided in order to explain why the following syntax has been developed: .role1<-association_id(association_type)->role2 So the above example can be written in Toma as: select$topic
where id(’little_finger’).$$<-(connect_to)->$$
.$$<-(connect_to)->$$
.$$<-(connect_to)->$$ = $topic; or select$topic
where id(’little_finger’)
.connected<-(connect_to)->connected
.connected<-(connect_to)->connected
.connected<-(connect_to)->connected = $topic; Which is at least as elegant as: > In TMQL I would do > > where >$f = little_finger
> & $f <- connected [ ^ connect_to ] -> connected ==$f'
> & $f' <- connected [ ^ connect_to ] -> connected ==$f''
> & $f'' <- connected [ ^ connect_to ] -> connected ==$f'''
>

> --
>
> What I found slightly difficult to grasp is that filtering is done quite
> differently, depending on what to filter. To get one particular name from the
> list, one would do this via

The different syntax points to different kind of filtering.

>
>   $topic.bn['central processing unit'] Here we are looking for a basename with the value 'centeral processing unit'. > >$topic.bn (abbreviation)
Here we are looking for a basename which is of type abbreviation.
>
> Filtering of roles according to the type is using [] again:
>
>    ...-> [role-type]

Actually:

...->role-type

(so without the square brackets) Here we are looking for a player
playing a role of that type.

>   ....$$<- (assoc-type) -> Here we are looking for a player playing an association of type assoc-type. > topic.bn @ english Here we are looking for a basename with scope english. > > In TMQL filtering is always done with []. Fullstop. So how do you tell the engine that you want a topic of type t1 and scope s1 and later for a topic of type s1 and scope t1? > -- > > In TOMA, binding of variables happens in several places. I would be happy to see > it in the WHERE clause only, but TOMA seems to allow it also in the SELECT > clause: > > select topic.bn [ bn ] > where > topic.id = 'foo' and bn ~ '^a'; > > It is not effectively possible to look at the WHERE clause only and then > evaluate the SELECT expression(s). I could not figure out whether in SELECT > clauses path expressions in their full beauty can be used. But if so, how is > SELECT then different from WHERE? > > It would be interesting to see a formal definition. > > All of this is not necessary in TMQL: > > TOMA: > > select topic.bn @ scope, scope.id > where topic.bn = 'lung' > > TMQL: > select topic / name ( . , . @ ) > where topic / name == "lung" > > or > > "lung" \ name / name ( . , . @ ) > Could you give an example of the following in TMQL: select topic.bn@scope, scope.bn@scope2, scope2.id where topic.bn = 'lung'; > -- ad Identification > > In TOMA, topic identifiers (local identifiers) can be written as either > > 'foo' > > or > > id ('foo') > > or > > foo > Not exactly. It depends on the syntax. 'foo' is a string. If it is used in the topic literal id(), it is taken as the id of a topic. "Naked literal" - foo - can be used inside typing brackets, following the scope operator @ and as the roles in the association expression. > Identification of topics can also be via their name, but then the path > expression has to be read from right to left > > topic.bn('Processor') This is not valid Toma. bn('Processor') is a literal. It can start path expression: it has output which is the topic which has basename 'Processor'. But it takes no input. So it cannot be chained after anything. On the other hand, .bn is a path expression and can be followed by a typing brackets which contain naked literal: topic.bn(abbreviation) can be written in Toma and means the basename of type abbreviation. > > Identification of topics via subject indicators seems to exist, but > what about locators? What exactly is > > topic.si ('http://....') This is not valid Toma. si('http://...') is a subjectIdentity literal. It can start a path expression chaining: it has output which is the topic which has the subjectIdentity 'http://.../'. But it takes no input. So it cannot be chained after topic. Note that you have the path expression .si which takes as input a topic and output its subjectIdentity. > > -- Conciseness > > Some queries seem to go overboard with the syntax: > > select topic.oc(mass).sc['textual'] > where topic.type.super(*) = 'device'; > > In TMQL this would be > > // device / mass [ @ textual ] > > or > > select device / mass [ @ textual ] > where device isa device > I am not sure how types and classes are dealt with in TMQL. In Toma, I chose not to mix them as it was not mixed by XTM: classes are defined by associations, and types in the topic itself. The select you quote above, tries to take any descendant of the topic device, that it: any instance of device, any instance of a subclass of device, any instance of a subclass of a subclass of device etc. In Toma you could also ask to see any subclass of a topic from any level or any range of levels and the same with types (although usually it is recommended to have only one level of typing, that is: an instance should not have instances). > -- Unexpected Things > > What I found unexpected, is that in TOMA, to find all associations of type > 'part-whole', one has to write (Example 1 in 2.5.1 of 'TOMA Spec'): > > select a > where a.id = 'part-whole'; :-) This is an example of how NOT to find associations of type 'part-whole'. The text that follows this example explains it: In this example the engine doesn't know that the variable a is an association. It deals with it, as if it were an ordinary topic variable and then looks among all topics which one has the id 'part-whole'. In order to find all associations of type part-whole you should write in Toma: select distinct a where exists a(part-whole)->$$;

> --
>
> To avoid convoluted path expression chains, a special semantics has been in
> TOMA, so that two consecutive associations are always different (2.5.2).
>
> That seemed quite adhoc to me.

The reason for this is demonstrated in the following:

select $topic where id(’little_finger’).$$<-(connect_to)->$$ .$$<-(connect_to)->$$ =$topic;

We do not want to get as a result for $topic the topic 'little_finger', but without this rule, we would get it (because 'little_finger' is connected to whatever which is connected back to 'little_finger'). I wonder how TMQL solves this. > -- XML and TM content > > TOMA obviously cannot generate TM or XML content. Now Rani will claim that this > is a template thing. But it is not, I think, and even if it were, for > performance reasons this should be part of the language. Rani indeed claims that this is a template thing :-) I have put as a focus to have Toma as simple and as small as possible. It is TMQL, TMCL and TMML and nothing else. I assumed that users that want to create applications using Toma or any other TMQL will use in addition other technologies (Java, Perl, Python etc.). Each of those technologies provide sets of techniques and methodologies to create XML content as well as any other content. Why to extend the language to include another such technique? And what are the performance reasons here? > > -- Output handling > > The query result is always the 'textual representation' of the output, right? > > What if this is a list? So it is a list of textual representations, not their > concatenation, right? How to control that you do _NOT_ want the string, but the > characteristic itself? > > In TMQL, the default is 'atomification', i.e. the conversion of characteristics > into the value they contain. In this process, the scope and the type are > lost. This, so I assume, is what most users are asking for. > > TMQL: > > robert / name # auto-atomification at the end, one gets a string > > robert / name [ @ nick ] # ditto, but after filtering for this scope > > robert >> characteristics name # get the whole characteristics item > Toma returns indeed always textual representation of the output. However, it is very simple to instead return the objects themselves. This syntax was even available in earlier versions of Toma and was not forgotten: SELECT ... AS XTM WHERE ... or SELECT ... AS OBJECT WHERE ... It is even not too difficult to implement it, the moment we understand how we provide an object. For example, TopiEngine support an implementation of TMAPI like API in C++ where C++ objects are available for the different Topic Maps objects. The reason I didn't include this in Toma was that I didn't have a real use case for it. After all, when I get a topic ID, I can use the TMAPI to get the topic in a very similar fashion. > -- Functions and Data Types > > TOMA's choice here is ad-hoc, in TMQL we have to think more lateral. Can you explain this please? > > -- Ordering > > In TOMA one has to refer to a 'column number', so this is a bit brittle, when > you fiddle around with the SELECT clause. Or maybe the example there is just > misleading. > > In TMQL ordering can be done via a stand-alone path expression, i.e. one can > order according to something which is NOT even in the SELECT clause > > select$p / name
>   where $p isa person > order by$p / age desc

This is a design decision I have made in order to make things a bit
simpler for the user (I hope). When you sort by anything that is not in
the SELECT clause, you might get into confusing situations.
For example, when trying to sort the names of topics by the topic types:
there are more names then types, so how to sort the names exactly?
And if you sort by basenames and not all topics have basenames? how to
sort the topic ids?

When thinking about it I have realized that from the implementation
point of view the two choices are very similar. If you wish - sorting by
something that is not in the SELECT clause meaning to add a hidden
column to the SELECT clause and sort by that hidden column.
So instead of doing that, I decided that the user will add this column
explicitly instead of the engine doing it implicitly.
This way the confusion is avoided: when you sort by a column you see,
you probably understand better how things are sorted.

>
> -- Implicit subclassing
>
> In TMQL the default is that subclass hierarchies are honored. So when you do a
>
>    $person / size > > then you get all occurrences of this type and all its subtypes, so also > shoesize, hatsize, whatever is available in the map and it is made explicit that > shoesize iko size and hatsize iko size, etc. > > TOMA seems to have a 'strict' interpretation. The problem with that is that it > can be cumbersome to write queries which are robust against subtle changes in > the type structure. So the 'immediate type' or 'immediate subclass' may change; > if we allow programmers to rely too much on it, our applications are all dancing > on thin ice. Indeed in Toma one should write the following:$person.oc($type) and$type.type.super(*) = 'size'

to get the same effect.

I didn't realize the importance of such queries. I wonder if you would
recommend to change the way Toma interpret such types - so types of
basenames, occurrences, associations and roles to include by default any
instance of any subclass of the written type (as size is interpreted in
\$person / size).

>
> \rho
> _______________________________________________
> tmql-wg mailing list
> tmql-wg at isotopicmaps.org
> http://www.isotopicmaps.org/mailman/listinfo/tmql-wg
>

Kind regards,

Rani
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rp.vcf
Type: text/x-vcard
Size: 317 bytes
Desc: not available
Url : http://www.petesbox.net/pipermail/tmql-wg/attachments/20070219/74e2f766/rp-0001.vcf