[sc34wg3] Canonicalization of embedded XML
Lars Marius Garshol
larsga at ontopia.net
Fri Apr 21 04:18:44 EDT 2006
* Lars Heuer
>
> Your example
>
> Foo <a d="e" b="c">bar</a>
>
> is a valid one for embedded XML?
Yes. (It's valid according to the RELAX-NG schema, and there's
nothing in the prose prohibiting it.)
> The XTM reader has to search for the XML inside the character sequence
> "Foo <a d="e" b="c">bar</a>" and canonicalize it?
No. The XTM reader will be using an XML parser. That XML parser will
parse this XML fragment in the same way as the rest of the document.
The XTM reader then has to switch into "canonicalizing mode" inside
<resourceData>.
Note, however, that this requirement is much less strong than it
seems, since canonicalization is a procedure that only has two effects:
(1) It preserves the namespace declarations in the source context
that
might otherwise be lost.
(2) It transforms the *syntactic expression* of the embedded XML.
What this means is that unless you are going to treat the embedded
XML as a string you can ignore (2), and only preserve the namespace
declarations. The downside is that for duplicate suppression
(equality rules in TMDM etc) you actually do treat the embedded XML
as a string. I wouldn't be too surprised if some Topic Maps software
were to have a disclaimer on the box stating that it doesn't do this.
> If I understood it correctly this would also be valid example for
> embedded XML:
>
> <foo a="b">bla</foo>
> <bar c="d">blub</bar>
>
> Is such thing c14n'able with standard conform XML canonicalizers since
> the root node is missing?
Yes.
> BTW: The TMDM mentions xsd:anyType as datatype, the XTM mentions
> xsd:any as datatype.
Whoops. I'll look into this. Thanks!
--
Lars Marius Garshol, Ontopian http://www.ontopia.net
+47 98 21 55 50 http://www.garshol.priv.no
More information about the sc34wg3
mailing list