[tmql-wg] New TMQL proposal

Mon Nov 13 22:46:59 EST 2006

On Mon, Nov 06, 2006 at 12:05:21PM +0100, Pawe? G?siorowski wrote:
> Hi Robert,
> 
> 	Thank you for looking at my work.
> 	I tried to answer your comments and put them below.
> 	Very important feature of DTMQL is that it allows to query TMs, RDFs
> and DB tables in a single query. Other thing is that its framework is a RDB.
> 	I am not sure if it sounds clear so if you have questions please
> ask.

[ I am cc'ing the tmql mailing list. ]

For the others, this is about the TMQL use case solutions solved
with DTMQL:

   > I have put the UC solutions at topicmaps.axent.pl

> From: Robert Barta [mailto:rho at bigpond.net.au] 
> > Your use case solutions also underline why we tried to move away from
> > having predicates for _everything_. So a
> >
> >    abstract ($D, $A)
> >
> > becomes in TMQL
> >
> >    $D / abstract
> >
> > especially for occurrences this is more easier to the eyes. For 19
> > TMQL would allow to write
> >
> >   // tutorial ++ // paper
> >
> > or SQL-ish
> >
> >    select $d
> >    where $d isa tutorial | $d isa paper
> >
> > or ...
> >
> > Nr 23 would only become // * / email
> 
> In DTMQL one can also query items using 'isa' predicate:
> 
> SELECT "$P"
> FROM dtmql { instanceOf($P,person) } a;

That was not the point I was trying to make.

The way I see it, is that (a) a TM has a lot of graph-like connections
between the predefined items (topic, assocs, ...) and that these
connections run along certain axes: from assoc to players (or back)
from topic to types (and back), and so on.

If you offer a query language, then there are two choices to look at
these axes:

  - unbiased: use variables (bound and unbound) and try to match

    This is what tolog does and DTMQL.

  - biased: a 'current position' is assumed and a particular axis
    is followed from there.

    This is what TMQL _also_ does.

Semantically, both approaches are equivalent, but with longer paths I
find

    axis1 ($a, $b) and
    axis2 ($b, $c) and
    axis3 ($c, $d)

not so readable as

    $a axis1 axis2 axis3

especially when $b and $c are never needed. Which is quite often the

> > The other problem I have with 'predicate only' is that such an
> > approach has NOTHING to do with TMs. You could switch to, say, UML
> > metamodelling and use exactly the same language structure.

> I tried to make DTMQL not strictly dedicated so that I can perform queries
> on TMs and RDFs in a single query. It may be treated as an extension to SQL
> which allows to query TMs in a much simpler way compared to SQL only.
> This was important for me as I am designing a CMS software using TMs, RDF
> and RDB tables. The structure of DTMQL allows to query such structures but
> only if they are stored in RDB.

That choice is perfectly ok for a product. For a _standard_ I would
argue that it is too limiting. Consider (my favourite example :-) a
query over the DNS:

   http://www.idealliance.org/papers/dx_xmle04/slides/barta/foil14.html

> > I am also unsure how you treat the situation where one has to
> > distinguish between the 'occurrence item' and only the value
> > itself. This has caused some headache here.
> 
> This one I am still thinking over. Current concept is :
> 
> SELECT "$A" FROM dtmql {...} a; returns :
> - object identifier if $A is an object
> - string if $A is a value
> 
> To extract a value from occrrence object one may :
> - return object and use a proper function from 'tmo_methods' schema
> - or return a value by using proper built-in predicates in the DTMQL block.

Yes, but this is all manual work. Which the TMQL processor can do,
and can do much faster than actually calling an (in general) arbitrary
function. In the TMQL expression

select $p / homepage
from ...
where
   $p isa person

the processor will automatically 'atomify' the homepage, because it
assumes that this is what most people actually want. If that is not
the case then the shortcut  "/ homepage" has to be replaced with
the navigation

   $p characteristics :>: homepage

[ or whatever syntax we end up with ].

> > Also see how TMQL resolves 11 without having to resort to subqueries:
> 
> > select $a
> > where
> >        is-author-of ($a,  $d)
> >  & not is-author-of ($a', $d)
> 
> I think this one shows a difference between treating the negation.
> In the above query there is a problem with the "$a'" variable.
> For example the expression "not is-author-of($a',$d)" is true for $a' :
> 
>  - topic[@id = puccini]
>  - occurrence
>  - "Bla Bla Bla"
>    etc.

If for 'puccini' there is no 'is-author-of' association, then, yes, the
predicate "not is-author-of ($a', $d)" is true.

> So probably it should return infinite set of items.

Every map is finite. We have a closed world here.

> Or should it return items that make the predicate true but only the ones
> that exist in the TM.

Yup, that is our current position. Which is non-Semantic-Web-ish, I guess.

> To solve that DTMQL only allows negation if all variables in the negated
> predicate appear in positive predicates in the same alternative. So I can
> rewrite the query #11 as follows:
> 
> SELECT "$A1"
> FROM
>   dtmql {
>     is-author-of($A2 : author, $SOME_DOC : opus),
>     is-author-of($A1 : author, $D : opus),
>     not(is-author-of($A2 : author, $DOC : opus)),
>     $A1 <> $A2
>   }

> I am not sure if it's clear. If you have any questions please ask.

Do you mean to have DIFFERENT variables $SOME_DOC, $D, $DOC? But then
they can bind to different values, right? Which is not what this query
is about (2 authors of the SAME doc).

If only $DOC were be used, then this is _EXACTLY_ what the above TMQL
query does.

> > Case 18 has to use temporary tables. This can be a quite expensive
> > operation if only a small fraction of the TM actually contains the
> > relevant information.  This is almost impossible for a SQL processor
> > to optimize because the query author has preempted optimization.
> 
> This one is sill a problem as SQL allows only first order logic.
> The solution for UC#18 is temporary and I am still trying to find a way to
> query transitive relationships. For sure they are 'killing' the DB and the
> task is to find the most optimal solution.

For TMQL I would argue that how this is implemented is
"implementation" :-) Maybe it is hard to do it "on the fly" with
RDBMSs, maybe it is simple if ontological information about the
queried map is available.

TMQL does not try to mess with that.

> > Good to see your solutions as they show nicely where (and why) TMQL
> > has chosen a different path. It is also clear that DTMQL is much
> > closer to SQL and may be easier to implement if a TM is already there.
> > But again, a TM has to be in the RDB already _in a specific way_ to
> > make this work.
> 
> Yes, that is true DTMQL is designed for querying TMs stored in RDBs.

Here we have it again :-))

\rho

> > -----Original Message-----
> > From: Robert Barta [mailto:rho at bigpond.net.au] 
> > Sent: Saturday, October 21, 2006 6:05 AM
> > To: Pawe? G?siorowski
> > Cc: tmql-wg at isotopicmaps.org
> > Subject: Re: [tmql-wg] New TMQL proposal
> > 
> > On Tue, Oct 10, 2006 at 08:31:38PM +0200, Pawe? G?siorowski wrote:
> > >   I am preparing a new TMQL proposal for quering topic maps stored in
> > > relational database.
> > 
> > Hi Pawel,
> > 
> > Maybe it is a bit late in the game to suggest 'a new' TMQL, but any
> > input is appreciated. Please note, that TMQL (as standard) __CAN NOT__
> > assume that someone is storing his topic maps in a relational
> > database. Conceptually, topic maps can be wrapped around __ANY__ data
> > resource.
> > 
> > I also assume that you are aware of
> > 
> >    http://topicmaps.it.bond.edu.au/docs/37/toc
> > 
> > >   Yet I have only documented a list of use case solutions, I would be
> glad
> > > if you could take a look at it and give me some comments.
> > 
> > If you tell me/us, where this can be found...?
> > 
> > \rho
> > 
> > PS: This mailing list is not overly active. I only stumbled over this post
> >     now.
> > 
>