[sc34wg3] Almost arbitrary markup in resourceData

Lars Marius Garshol sc34wg3@isotopicmaps.org
18 Nov 2003 03:26:00 +0100


* Steven R. Newcomb
| 
| For me, the decision to exclude non-TM markup was a matter of
| controlling the scope of the project, as well as the scope of any
| project involving the implementation of a Topic Map processor.  As
| everybody knows, rigorous scope control is at the heart of creating
| a successful standard.

I have to say that for me this ways always the main argument against:
added complexity. That's a trade-off, though, and to me it now seems
that we should do this despite the cost.
 
| [...] and if Topic Map engines aren't going to be required to
| duplicate and special-case at least some of what really and
| exclusively belongs in an XML engine.

They will, but I'd imagine most people will solve this by reusing some
XML software. Note that the SQL folks are doing the same thing to
themselves. 

| In other words, this whole idea violates modularity for me.

Strictly speaking I think it does to some extent.

| I find the inclusion of markup in <baseNameString> hard to swallow.
| Maybe that's because I'm an unreconstructed believer in the
| name-based merging rule. 

Actually, I'm having some difficulty with that too, but for different
reasons. (I'll go into those below.) I don't think there's any
difference between <baseNameString> and <resourceData> in this
particular regard. Removing duplicate occurrences and variants is
really no different from doing base name-based merging, and we need to
support both.

We can define equivalence of XML fragments by stating that XML
fragments whose Canonical XML representations are identical are
equivalent. That may be expensive to compute (I don't really know),
but at least it gives us a solution. So I think we can still have
merging for topic names, variant names, and occurrences.

What does make me worried about <baseNameString> is two things:

  1) our rationale for allowing XML in <resourceData> is that it's
     equivalent to <resourceRef>, but <baseNameString> really isn't,
     and topic names have no [locator] property,

  2) base names are crucial to all kinds of user interfaces, because
     they provide labels for the topics, and without those you don't
     really have much of a UI. We can have resources as names for
     topics (through variants), but having base names as strings
     ensures that there's always *something* that can be displayed as
     a mere string.

     If we allow markup in here that goes out the door. You may have
     to strip (or, even worse, render) XML markup to be able to label
     your topics.

I'd be interested to hear what people think of this. Should we change
our minds and only do this for <resourceData>?

|   And it's why people sometimes excitedly "discover" that if they
|   turn their topic names into URIs, they can get the name-based
|   merging they want.

You are right about this, but I still don't think that implies that
the original TNC rule was right. I think allowing people to declare
how they want name-based merging to work is the right way to do this.
(Water under the bridge by now, but I wanted to say this even so.)
 
| So, if there's markup in a <baseNameString>, and name-based merging
| is switched on, on what basis will name matching be done?  

The equivalence rule for topic name items. We haven't defined it yet
in the presence of markup (will be part of the XML representation
proposal), but I think we'll have to base it on Canonical XML. (From
what I gathered from Dan Connolly, that seems to be what the RDF folks
will do, and for the same reason I propose it: lack of alternatives.)

| What about the nonsignificant whitespace in such markup, 

Ignored with Canonical XML.

| and what about the order of the attribute value specifications in
| the start tags?  

Ditto.

| Suddenly we have to have a whole bunch of complicated rules, or to
| invoke a parser-output standard like RAST, where a simple,
| application-neutral string match used to be sufficient.

True, unfortunately.

| I don't like it when things get more complex.  There's gotta be a
| damn good reason.  Jim says he has one, and I take him at his word,
| but I'd be happier if he would explain why <variantName> won't meet
| his needs, [...]

I'd very much like to hear this too. Jim?

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >