[sc34wg3] TMQL: counting within groups of tuples

Thu Sep 17 20:04:09 EDT 2009

On Tue, Sep 15, 2009 at 01:27:33AM +0200, Xuân Baldauf wrote:
> Hello,
> 
> this problem occurred to me in practice and is surprisingly hard to
> solve in SQL:
> 
> Given a table (= tuple sequence), partition the rows of the table into
> sets of rows (= tuples) which have a common criterion (like a common
> value in a particular column). For each of these partitions, sort the
> rows according to another criterion (e.g. another particular column) and
> then assign each row a consecutive integer index (e.g. the first row
> within the partition gets the value 0, the next row within the partition
> gets the value 1, the next row within the partition gets the value 2,
> ...). This index may be called "rank".

Ok, TMQL operates first on topic maps, not on tables. You should know
that! ;-)

But once you have matched things - which corresponds now to the table
you have "given" - this should be trivial to solve with TMQL-as-is.
Probably easiest to write with FLWR.

Hint: $# gives you the position in a sequence.

\rho

> Take this example:
> 
> id | fkey | data
> ================
> 1  | 5    | foo
> 2  | 5    | bar
> 3  | 8    | bla
> 4  | 5    | baz
> 5  | 8    | blub
> 
> The partitioning criterion is "fkey", and the sort-order is along "id".
> This table should be translated to:
> 
> rank | id | fkey | data
> =======================
> 0    | 1  | 5    | foo
> 1    | 2  | 5    | bar
> 0    | 3  | 8    | bla
> 2    | 4  | 5    | baz
> 1    | 5  | 8    | blub
> 
> Note that the output ordering of individual rows does not matter, but
> the rank numbers must be correct. However, as we may be talking about
> sets of rows, the result
> 
> rank | id | fkey | data
> =======================
> 0    | 1  | 5    | foo
> 1    | 2  | 5    | bar
> 2    | 4  | 5    | baz
> 0    | 3  | 8    | bla
> 1    | 5  | 8    | blub
> 
> would be equally ok.
> 
> 
> I think that TMQL should be powerful enough to formulate such data
> pŕocessing (or at least be as extensible as possible that, if such power
> is not part of the standard, it can be easily and uniformly integrated).
> 
> Thus, I'd like that this example should be considered for being added to
> the TMQL requirements.
> 
> It is an interesting challenge for parallel TMQL engines: While each row
> that is subject to such an operation is not completely independent from
> other rows (because the existence of other rows affects the own rank
> number), each row is not fully dependent on other rows either (for a
> given row, other rows with a different fkey are totally ignorable).
> Thus, the processing of this operator can be "somehow partially"
> parallelized across independent machines if each partition of rows
> happens to exist in at most one machine.
> 
> ciao,
> Xuân.
> 
> 
> _______________________________________________
> sc34wg3 mailing list
> sc34wg3 at isotopicmaps.org
> http://www.isotopicmaps.org/mailman/listinfo/sc34wg3