[tmql-wg] Result set requirements

Wed, 3 Mar 2004 07:25:23 +1000

On Tue, Mar 02, 2004 at 12:17:13PM +0100, Rani Pinchuk wrote:
> Maybe I should emphasis that I speak about complexity in learning the 
> language - so in the syntax of the language:
> 
> If the query returns ALWAYS one list of strings, there is no use to
> include that fact in the query. The query language should not have at
> all the select clause (which makes its syntax simpler).
....
> If the query include apart from that also the ability to return XML, 
> the select clause should include extra syntax to support also returning 
> of XML (like DTD or like the way you do it).

Rani,

Additional syntax makes it easier to learn the language? I am not sure
whether a

  SELECT <<ATTENTION, ATTENTION HERE NOW COMES XML>>
     <xml-here/>

compared to

  return
     <xml-here/>

is simpler to learn :-)

And - if we ever dare to walk into the snakepit of XML schema
languages - we should be as agnostic as possible. I would find it
great if we could exclude DTDs and friends completely.

> With algorithms I mean that the data from the query can still go
> through an algorithm that is written in the query language (which
> obviously becomes very generic language at that point).

So you would not include functions implementing algorithms in a
query language to keep it simple?

If so, then SQL is not simple, because it has functions:

   SELECT 1+2*3;

> So for example, if the data from the query is list of words, the
> algorithm can be an algorithm that returns instead the types of the
> words (verb/noun/adj/adv etc). To have that, you must program a nice
> algorithm that identify the words into their groups.

Hmmm, if something COULD be done with the query language, why MUST
I tell the developer that he SHOULD NOT do it within the query language,
but with some external program?

> > In case of AsTMa we DO NOT expand strings. It is true that we write
> > the query as a string (well, of course), but the XML (as well as the
> > list and Topic Map constructor) are no text templates, but
> > internalized.

> I am not sure I understood that. What I meant is that the syntax of
> AsTMa suggests that the {$a} and {$b} are strings.

Expressions like {$a} or {$a/bn} are interpreted in the context. If
I use it like so

   <coffee id="{$a/bn}">need more</coffee>

then obviously I want a string to be inserted. If {$a/bn} can be
evaluated to a string, then AsTMa? can do that. If I simply ask for
{$a/bn} then the context could also say "return it as information
item" (I have not implemented this yet because I never needed it). Why
use a different syntax to hammer home such a simple feature?

So in case of

   <coffee id="{$a/bn}">need more</coffee>

this looks as if this is like a template expansion, because XML is
string-oriented.  But

   ({$a/bn}, {$a/oc(homepage)}, ...)

is handled differently, and so it is the case with creating a TM:

# stupid way to copy all coffee-cup topics
# get a list
function addict () as list return {
   forall $c [ * (coffee-cups)
   return
      {$c}
}

# get a map, consisting of these topics only
function addict () as map return {
   forall $c [ * (coffee-cups)
   return
      {$c}
}

> So the way it is implemented is not that relevant for my
> argument. To put it more clear - could the very exact syntax of
> AsTMa be implemented using templates and expanding strings?

Hmm, _everything_ can be implemented expanding strings. Every machine
in the Chomsky hierarchy expands strings. I am not sure where you
are going with this.

Fact is, that

   return
       <coffee id="{$a/bn}">I really need more now</coffee>

CAN BE implemented using internal data structures (DOM fragments,
whatever), so that I DO NOT HAVE TO do it with "string handling"
(which is much slower).  It also means that the application gets this
in a already parsed form. Which is a hell of a difference to
templating.

I found the design of XQuery quite instructive, btw.

> > Sure, sure. But how much have we actually gained and how much lost?

> We gained the separation between the languages. You don't have the mix.
> It is readable and maintainable. We lost the same we loose when we use
> for example OO - some speed in development (at least in the first
> phase), and some speed in performance. 

What you suggest (creating templates for designers or to encapsulate
XML snippets) may be all very well. I have seen very stupid designers
and I have worked with VERY talented ones. So I would assume for some
projects massive templating is ok, for others may be not.

To hard-wire the "you-must-factor-everything-into-some-template" would
mean to patronize. I would not like to go that path. If someone wants
templates, then he/she should pick one of the 10000000000 templating
packages and put it an top of TMQL.

Others will choose not to use templates at all, but will use TMQL at
some higher abstraction level. These people will generate abstraction
layers and will offer objects and object classes to the application.
These people will not be interested in lists or XML, they will use map
results to create their abstraction, I would assume.

> > But isn't that just an XMLish notation of AsTMa? itself? And what happens
> > now in case of nested queries?
> > 
> > <albums>{
> >     forall $t [ $a (album)
> >                 bn: $bn ] in $m
> >     return
> >     <album id="{$a}">{$bn}
> >     {
> >      forall [ (is-producer-of)
> >               album: $a
> >               producer: $p ] in $m
> >         return
> >            <producer>{$p/bn}</producer>
> >      }
> >     </album>
> > }
> > </albums>
> 
> You can implement it as follows:
> 
> <albums>
>    <while condition="loop_over_albums">
>       <album id="$album_id">$album_name
>           <while condition="loop_over_producer_of_albun">
>                <producer>$producer_name</producer>
>           </while>
>       </album>
>    </while>
> </albums>
> 
> In your program you query for all the albums ids.
> 
> loop_over_albums contains code that gets the next album id, and run a
> query over all the producers of that album.

How does the outer loop know what the values are to iterate over? How
does the inner loop know what the values are to iterate over? Is this
somehow communicated behind the scenes? And what happens when the
template-engineer changes the above to

<albums>
   <x:while condition="loop_over_albums">
      <album id="$album_id">$album_name
          <x:while condition="loop_over_producer_of_albun">
               <producer>$producer_name</producer>
          </x:while>
          <x:while condition="loop_over_producer_of_albun">
               <producer>$producer_name</producer>
          </x:while>
      </album>
   </x:while>
</albums>

Would the functions realize what is going on?

Making this separation systematical you will have to introduce A LOT
of communication between parts of the template AND between the
template and the functions in the background doing the "business
logic" stuff.

Introducing MORE communication patterns does NOT MAKE the language
simpler, IMHO. If you can isolate particular use cases and create
templates for your designers in your project for your particular
organisation with your particular programmers, that is ok.

As a general approach to a language I am not convinced.

> I guess the above looks a bit strange when you are not used to it. 
> But when your code becomes bigger, this separation makes it much more
> clear. 

I have used and programmed XML taglibs. They work very well for a
defined set of tags, a defined set of parameters and for things which
are good to isolate from the environment. For instance

    some other XML stuff here

    <my:weather zip="12345" country="australia"/>

    some other XML stuff here

They are a pain in the ... earlobe when you start to connect things
and pass around values.

> Try to think about a scenario when you want to make changes after the
> application is already done (and big) - how do you change the XML?

Change the XML. :-)

> How do you change the queries (suppose the topic map structure is
> changed).

Change the queries. :-)

> In the separation approach this will be much easier because there is
> no mix in the languages and it is all more readable.

Again, if someone wants to have this separation, then this can be
implemented on top of TMQL. If someone else knows what he is doing,
then we should not patronize him.

> > > So I don't try to avoid running the same query. I try to avoid
> > > hard-coding the same query.
> > 
> > Isn't that exactly what I said above? That 'some pattern P' should
> > be in one place and is reused with different constructors?

> Yes, but for me the sentence "I try to avoid mixing those two" is
> more important. Exactly like with error messages in English. It is
> better to put them outside of your code

It may depend. If someone writes a single script of 30 lines and
starts pulling error message codes from external files then I would
seriously ask some questions (after shooting of course :-)

In a big system where maybe internationalization is an issue, then
this might be standard procedure. Still, there are many different
approaches. None of them has been built into, say, Oracle. It can be
built on-top of it, though.

\rho