[tmql-wg] Result set requirements

Robert Barta rho@bigpond.net.au
Sun, 28 Mar 2004 13:30:37 +1000


On Tue, Mar 16, 2004 at 10:54:34AM +0100, Lars Marius Garshol wrote:
> If we knew that $A was a topic of type 'person' and that $B and $C had
> to be dates we could do things more efficiently, but there are some
> other considerations:
> 
>  - if we require type declarations we make the programmer's job much
>    harder,

As my mother said: "No pain, no gain."

To a certain extent the query processor can try to build indices based
on the most frequent queries, but the more information you put in, the
...

>  - declaring the types will in general restrict the use of a
>    function/inference rule to situations where the types fit, (an
>    excellent example is Java, where the same methods are often defined
>    X times for X different types), and

...more you make it specific, but faster. An acceptable tradeoff
situation, I think.

>  - if you can establish the types used for a particular invocation you
>    can internally create a new instantiation of the function/rule
>    where the types have been upgraded and if necessary do this for all
>    the different type combinations that turn up.

This is polymorphism and late vs. early binding?

> The main downside to having an implicit typing approach like this, I
> think, is that the performance model of the language tends to become
> very complex. What this means is that users may find that minor tweaks
> to schemas or queries cause huge performance differences in practice
> in ways that are very difficult for them to predict, and similarly
> that which queries run fast on which implementations may also be very
> difficult to predict.

Yes, most probably, the behaviour will be rather 'chaotic', in the
mathematical sense. OTOH, after 30 years of database development for
RDBMSes, we still see these Oracle gurus asking obscene amounts of
money just to tweak your DB and to optimize your queries. :-)

So why not have this with TM databases?

> I'll have a look, but this route is not free of dangers, as I think
> Jeni Tennison documented admirably:
> 
> <URL: http://www.idealliance.org/papers/extreme03/html/2003/Tennison01/EML2003Tennison01-toc.html >
> 
> (Well, I haven't read the paper; I just saw the presentation, which
> was excellent.) 

Very interesting, indeed:

   >>> The complexity of the type system is largely a consequence of
   >>> XML Schema?s datatype specification....

What I find more promising is not (only) to import an external type
system into TMQL/TMCL, but to use the 'types' as provided by an
ontology definition.

For instance, if this is defined in a constraint language:

   # all hillarious things, must be either politicians or lecturers
   forall $t [ in (fun-indicator): hillarious ]
      => exists $t [ * (politician | lecturer) ]

and all maps which are subjected to a query follow this constraint,
then the following query can be optimized:

   file://mafia.atm : *                          # take _anything_ from the mafia map
     [ ./ in (fun-indicator) = 'hillarious' ]    # filter out those which are funny
     / in (bribe-level)                          # get the bribe money necessary

Instead of pulling ALL topics from the map, we can concentrate on
those being an instance of 'politician' or 'lecturer'. So we get

   file://mafia.atm : *
     -> is-instance-of / class [ . = (politician | lecturer) ]
     / in (bribe-level)

If the implementation already has an index on types (which is quite
likely), then this would speed up the query already.

If we additionally have in the ontology

   forall $t [ * (lecturer) ]
      => not exists [ in (bribe-level) : * ]

then the query transforms to

   file://mafia.atm : *
     -> is-instance-of [ ./ class = politician ]
     / in (bribe-level)

Something like this would be nice as it can transform queries in such
a way as they can use the existing indices and avoid naive iterations
over all possible combinations (simply avoiding the combinatorial
explosions).

The advantages given by using a "low-level" type system like XML
Schema Data Types are certainly there, but are probably not SO big.

> | I am not sure about the future of TMCL. At the moment it looks like
> | RDFS light.
> 
> It does, but we don't want it to wind up that way, nor do I think
> Graham wants that. So I wouldn't get worried just yet. I think the
> final result is much more likely to look like AsTMa!/OSL represented
> in topic maps, with some special syntax. The stuff Dmitry has been
> playing with lately looks promising to me.

As born Austrian is have the birthright to be worried about everything :-)

\rho