[sc34wg3] TMQL: counting within groups of tuples
Robert Barta
rho at devc.at
Thu Sep 17 20:04:09 EDT 2009
On Tue, Sep 15, 2009 at 01:27:33AM +0200, Xuân Baldauf wrote:
> Hello,
>
> this problem occurred to me in practice and is surprisingly hard to
> solve in SQL:
>
> Given a table (= tuple sequence), partition the rows of the table into
> sets of rows (= tuples) which have a common criterion (like a common
> value in a particular column). For each of these partitions, sort the
> rows according to another criterion (e.g. another particular column) and
> then assign each row a consecutive integer index (e.g. the first row
> within the partition gets the value 0, the next row within the partition
> gets the value 1, the next row within the partition gets the value 2,
> ...). This index may be called "rank".
Ok, TMQL operates first on topic maps, not on tables. You should know
that! ;-)
But once you have matched things - which corresponds now to the table
you have "given" - this should be trivial to solve with TMQL-as-is.
Probably easiest to write with FLWR.
Hint: $# gives you the position in a sequence.
\rho
> Take this example:
>
> id | fkey | data
> ================
> 1 | 5 | foo
> 2 | 5 | bar
> 3 | 8 | bla
> 4 | 5 | baz
> 5 | 8 | blub
>
> The partitioning criterion is "fkey", and the sort-order is along "id".
> This table should be translated to:
>
> rank | id | fkey | data
> =======================
> 0 | 1 | 5 | foo
> 1 | 2 | 5 | bar
> 0 | 3 | 8 | bla
> 2 | 4 | 5 | baz
> 1 | 5 | 8 | blub
>
> Note that the output ordering of individual rows does not matter, but
> the rank numbers must be correct. However, as we may be talking about
> sets of rows, the result
>
> rank | id | fkey | data
> =======================
> 0 | 1 | 5 | foo
> 1 | 2 | 5 | bar
> 2 | 4 | 5 | baz
> 0 | 3 | 8 | bla
> 1 | 5 | 8 | blub
>
> would be equally ok.
>
>
> I think that TMQL should be powerful enough to formulate such data
> pŕocessing (or at least be as extensible as possible that, if such power
> is not part of the standard, it can be easily and uniformly integrated).
>
> Thus, I'd like that this example should be considered for being added to
> the TMQL requirements.
>
> It is an interesting challenge for parallel TMQL engines: While each row
> that is subject to such an operation is not completely independent from
> other rows (because the existence of other rows affects the own rank
> number), each row is not fully dependent on other rows either (for a
> given row, other rows with a different fkey are totally ignorable).
> Thus, the processing of this operator can be "somehow partially"
> parallelized across independent machines if each partition of rows
> happens to exist in at most one machine.
>
> ciao,
> Xuân.
>
>
> _______________________________________________
> sc34wg3 mailing list
> sc34wg3 at isotopicmaps.org
> http://www.isotopicmaps.org/mailman/listinfo/sc34wg3
More information about the sc34wg3
mailing list