[sc34wg3] Whitespace agnosticism in CTM and descendants

Thu Apr 5 01:29:31 EDT 2007

On Tue, Apr 03, 2007 at 04:18:33AM +1200, Xu?n Baldauf wrote:
> I'd like to raise the issue of whitespace agnosticism.........
> An LTM example:
>    a(b:c : d)      has a different meaning to
>    a(b : c:d)

> As it is easy to make a "whitespace spelling" error ......., I like to
> recommend that whitespaces are not meaningful in CTM, except in string
> literals.

I do not see this is a major problem as CTM should be human-centric
and overloading of symbols _can_ help in this. If used wisely.

> ...................................................(and also as it is
> not straightforward anymore generate parsers for languages with mixed
> whitespace meaningfulness using common parser generators)

It may not be straightforward, but it is also not unduly difficult if
I go with our experiences with AsTMa.

>    1. De-overload the colon ":".
>          1. Use another symbol, like "=", "->", ":=", ",", whatever, for
>             separation of "type" expressions and "player" expressions in
>             "role" expressions.

Two characters are too long. And the most obvious character still is ':'.

>          2. Use another symbol, like '#', '+', '&', '%', whatever, for
>             separation of "prefix" expressions and "local" expressions
>             in "qname" expressions.

This breaks conceptual compatibility with everything the XML/RDF world
does. This would reduce adoption, IMHO.

>    5. Overload the colon and make the life of CTM authors and CTM parser
>       authors unnecessarily harder, more troublesome and error-prone,
>       forever.

I think, that we do not have the complete picture yet. In the expression

    a(x:y:z)

there is actually NO ambiguity:

  - if x is NOT a prefix, then is MUST be topic id, so it reads

    x : y:z

  - if x is a prefix, so it MUST read

    x:y : z

And prefixes MUST be declared before they are used.

Maybe the editors may analyze similar situations, but if it is true,
what I suspect here, then the 'whitespace rule' could go away
completely.

Implementationwise, it is a probably matter how much intelligence one
can put into the lexer to handle the prefixing. I would agree that
some older parser generators can make it very hard to handle this
flexibly.

> P.S.: To avoid these types of problems, one should really write a clean
> room reference implementation of a CTM parser (maybe even along with a
> test suite) before freezing the specs.

Seconded! 

"We reject kings, presidents and voting. We believe in rough consensus
and running code". David Clark

> P.P.P.S.: One could use the comma ',' as a replacement for the colon ':'
> in "role" expressions. The comma separating roles (in "roles"
> expressions) may then be replaced by the semicolon, letting associations
> look like "type(pr:type0,pr:value0; pr:type1,pr:value1;
> pr:type2,pr:value2)".

Brrr :-))

\rho