[tmql-wg] TOMA: comparison with TMQL

Sun Feb 18 20:16:24 EST 2007

Hi all,

While I actually worked through TOMA in Dec last year, only now I found the time
to write up my thoughts. It was interesting to see how other people have solved
particular problems. Or choose to ignore them :-) There have been also a few
cases where I have added/changed some TMQL features to match up with TOMA.

So, I am referring to the material at

   http://topicmaps.bond.edu.au/mda/internet/semantic-web/topicmaps/Toma

[ 31. Oct 2006 ] but, of course, cannot guarantee that everything below
perfectly reflects the real TOMA implementation, just my interpretation of the
text.

--

On a strategic level I think that it is a clever move to have query, constraint
and manipulation language consolidated into one language. I am not a strong
believer of these 'tower' architecture things. But - to think this thought to
the end - it also will have then to include TM content generation, whether this
is now used for generating for the application to use, or conversely, whether
it is used to update and insert TM content.

In this sense a

provides (provider: laptop, 
          provided: adapter,
          receiver: electricity220) -->> tm://laptops/

is easier on the eyes then the TOMA equivalent

  insert 'adapter'
     into (provides)->provider, 'laptop'
     into (provides)->receiver, 'electricity220'
     into (provides)->provided

And we can re(use) CTM syntax.

--

One thing I noticed with interest is that the TOMA authors thought it necessary
to provide mechanism to merge maps:

  use 't/db/columbus.db'
  merge with file://.....
     mark columbus-msm;

I think there is a real use case behind this, so TMQL also supports this by its
inherent 'merge' operator ++

  select ....
  from db:t/db/columbus ~~> ++ file://..... ~~>

The ~~> just follows the reification which leads from the topic indicated (the
map as a whole) to the map itself. Since both subexpressions are just 'normal'
TMQL, you can be more specific what to merge:

  select
  from db:t/db/columbus ~~> // vehicle ++ 
       file://..... ~~>     ... some other query

And the ++ operator is not specific for this particular cause: It is also used
to merge map fragments when TM content is generated:

for $person in // person
return """
    {$person}
"""

-- Exporting XTM

TOMA has first-class support for exporting a map to a resource:

  EXPORT TO whereever.xtm AS XTM;

In TMQL there is no need for a special syntax, as the processor knows that in
XML context topics and associations are exported as XTM anyway:

for $t in //*
return
   <topicMap>
      {$t}
   </topicMap>

This openness also allows to be very specific about what is actually exported.

--

One thing I obviously like is the bias towards path expressions. TOMA uses the
dot and the arrows ->, <- for the chaining. It is interesting to see that the
axes chosen are very similar to those in TMQL: players, roles, type(s),
instance(s), subclass(es), superclass(es).

There are important differences to TMQL, though: In TMQL navigating along a name
or an occurrence is just a special case of navigating from a topic to a
characteristic. And any type (not just the generic 'name' or 'occurrence') can
be used:

   $topic / nickname

--

Otherwise, in terms of path expressions, TOMA is not very different from TMQL:

TOMA:

  select $topic.bn 
  where $topic.bn = 'lung';

TMQL

  select $topic / name
  where $topic / name == "lung"

But TMQL is designed to be as concise as possible. So it can be made shorter

  select "lung" \ name / name
  # SELECT flavour, but no WHERE clause

or even more short

  "lung" \ name / name
  # a path expression

--

"Making shorter" is more eminent once queries get larger. A TOMA query

  select $topic.bn at dutch where $topic.bn at english = 'lung'

would transcribe into TMQL as

  select $topic / name [ @dutch ]
  where  $topic / name [ @english ] == "lung"

but also

  "lung" \ name [ @english] / name [ @dutch ]

-- long chains

In the TOMA Spec 2.5.2 there is a longish example how to chain several path
expressions while 'climbing' over associations.

select $topic
   where $a(connect_to)->connected = 'little_finger'
   and $a(connect_to)->connected = $p1
   and $p1 not in ('little_finger')
   and $b(connect_to)->connected = $p2
   and $b(connect_to)->connected = $p1
   and $p2.bn != $p1.bn
   and $c(connect_to)->connected = $p
   and $c(connect_to)->connected = $p2
   and $p.bn != $p2.bn
   and $a != $b
   and $b != $c;

Probably there is a trailing $a != $c missing, but my point here is that I have
a bit a hard time to check whether this actually does what it is supposed to do.

In TMQL I would do

where
  $f = little_finger
& $f   <- connected [ ^ connect_to ] -> connected == $f'
& $f'  <- connected [ ^ connect_to ] -> connected == $f''
& $f'' <- connected [ ^ connect_to ] -> connected == $f'''

and leave the rest to the machine. If a predicate is_connected_finger were
defined, then this can be further collapsed.

--

What I found slightly difficult to grasp is that filtering is done quite
differently, depending on what to filter. To get one particular name from the
list, one would do this via

  $topic.bn['central processing unit']

(I guessed the string is a value?) Filtering according to the type is done with
():

  $topic.bn (abbreviation)

Filtering of roles according to the type is using [] again:

   ...-> [role-type]

Filtering according to the association type is done like this

  ....$$<- (assoc-type) ->

Filtering by scope is done without any brackets:

   $topic.bn @ english

In TMQL filtering is always done with []. Fullstop.

The other thing I wondered was whether in TOMA you can use a full boolean expression
for a filter, something like

   $topic.bn (abbreviation or shorthand)

or

   $topic.bn (not abbreviation)

--

In TOMA, binding of variables happens in several places. I would be happy to see
it in the WHERE clause only, but TOMA seems to allow it also in the SELECT
clause:

  select $topic.bn [ $bn ]
  where
         $topic.id = 'foo' and $bn ~ '^a';

It is not effectively possible to look at the WHERE clause only and then
evaluate the SELECT expression(s). I could not figure out whether in SELECT
clauses path expressions in their full beauty can be used. But if so, how is
SELECT then different from WHERE?

It would be interesting to see a formal definition.

All of this is not necessary in TMQL:

TOMA:

   select $topic.bn @ $scope, $scope.id
   where  $topic.bn = 'lung'

TMQL:
   select $topic / name ( . , . @ )
   where  $topic / name == "lung"

or

   "lung" \ name / name ( . , . @ )

-- ad Identification

In TOMA, topic identifiers (local identifiers) can be written as either

   'foo'

or

   id ('foo')

or

   foo

Identification of topics can also be via their name, but then the path
expression has to be read from right to left

  $topic.bn('Processor')

and not - as usual - from left to right. Right?

Identification of topics via subject indicators seems to exist, but
what about locators? What exactly is

  $topic.si ('http://....')

In TMQL all of this is always looking similar:

by id             processor              

by qname          computers:processor

by indicator      "http://www.w3.org"/ ~

by locator        "http://www.w3.org/" =

by name           "Processor" \ name

by shoesize       40 \ shoesize

And it is symmetrical: / name gives the name, ~ the indicator URI, ....

-- Conciseness

Some queries seem to go overboard with the syntax:

  select $topic.oc(mass).sc['textual']
   where $topic.type.super(*) = 'device';

In TMQL this would be

  // device / mass [ @ textual ]

or

  select $device / mass [ @ textual ]
   where $device isa device

-- Unexpected Things

What I found unexpected, is that in TOMA, to find all associations of type
'part-whole', one has to write (Example 1 in 2.5.1 of 'TOMA Spec'):

  select $a
   where $a.id = 'part-whole';

Here I would have expected that the type is used as _type_, like in TMQL

select $a
  where $a isa part-whole

or

  // part-whole

--

To avoid convoluted path expression chains, a special semantics has been in
TOMA, so that two consecutive associations are always different (2.5.2).

That seemed quite adhoc to me.

--

The IN operator is odd as we already have an 'exists' operator in TOMA. And I
would expect the = operator also have exists semantics?

select $topic1.bn
  where $topic1.type.bn = 'mechanical device'
  and $topic1.oc (mass) in (
                            select $topic2.oc (mass)
                              where $topic2.type.bn = 'pc card'
                            )

becomes in TMQL

select $t / name
 where $t / mass == $t' / mass
    && $t  isa 'mechanical device' \ name
    && $t' isa 'pc card'           \ name

And TMQL honors subclassing here.

-- XML and TM content

TOMA obviously cannot generate TM or XML content. Now Rani will claim that this
is a template thing. But it is not, I think, and even if it were, for
performance reasons this should be part of the language.

-- Output handling

The query result is always the 'textual representation' of the output, right?

What if this is a list? So it is a list of textual representations, not their
concatenation, right? How to control that you do _NOT_ want the string, but the
characteristic itself?

In TMQL, the default is 'atomification', i.e. the conversion of characteristics
into the value they contain. In this process, the scope and the type are
lost. This, so I assume, is what most users are asking for.

TMQL:

  robert / name    # auto-atomification at the end, one gets a string

  robert / name [ @ nick ] # ditto, but after filtering for this scope

  robert >> characteristics name # get the whole characteristics item

-- Functions and Data Types

TOMA's choice here is ad-hoc, in TMQL we have to think more lateral.

-- Ordering

In TOMA one has to refer to a 'column number', so this is a bit brittle, when
you fiddle around with the SELECT clause. Or maybe the example there is just
misleading.

In TMQL ordering can be done via a stand-alone path expression, i.e. one can
order according to something which is NOT even in the SELECT clause

select $p / name
  where $p isa person
  order by $p / age desc

-- Implicit subclassing

In TMQL the default is that subclass hierarchies are honored. So when you do a

   $person / size

then you get all occurrences of this type and all its subtypes, so also
shoesize, hatsize, whatever is available in the map and it is made explicit that
shoesize iko size and hatsize iko size, etc.

TOMA seems to have a 'strict' interpretation. The problem with that is that it
can be cumbersome to write queries which are robust against subtle changes in
the type structure. So the 'immediate type' or 'immediate subclass' may change;
if we allow programmers to rely too much on it, our applications are all dancing
on thin ice.

\rho