[sc34wg3] Editors' drafts of TMDM and XTM 1.1

Steve Pepper sc34wg3@isotopicmaps.org
Wed, 11 Jan 2006 05:26:20 +0100


I'd like to have a stab at explaining some of the background
for the proposed changes to XTM, since Murray hasn't been part
of the process for a while.

* Murray Altheim
|
| > The differences are:=20
|=20
| > The namespace URI has changed.
|=20
| Yes, absolutely.

The original motivation for changing the namespace URI was
the fact that SC34 does not control the domain topicmaps.org.
We do, however, control topicmaps.com, thanks to the kindness
of Empolis GmbH, who donated it to us 6 months ago.

A number of things followed once we decided to change the
namespace:

1) There was agreement that *in principle* a namespace should
   remain stable and therefore it is not good practice to include
   the version number of a specification in the namespace.

2) Changing the namespace breaks backwards compatibility, at
   least from the point view of an application developer.
   Since backwards compatibility was the only reason for not
   fixing a number of inconsistencies that have become apparent
   over the years, it was decided to bite the bullet NOW,
   before Topic Maps has critical mass.

3) In addition to fixing inconsistencies, we now had the
   opportunity to align the terminology and structure of XTM
   with that of the TMDM. This lead to further changes.

I believe that all of the changes in XTM can be explained
through reference to one of these considerations:


| > The version attribute has been added to the topicMap element.=20
|=20
| There was no 'version' attribute in XTM 1.0, as the XML namespace
| identifier worked for that. I'm not sure why a 'version' attribute
| is really necessary, as it's redundant with the XML namespace. If
| the two are in conflict, what happens? Not specified.

The alternative to having the version in the namespace URI,
which, as noted, is considered bad practice (ref. 1), above),
is to have a version attribute. There cannot be any conflict
with the namespace because the latter no longer contains the
version.


| > The parameters element has been replaced by scope.=20
|=20
| I'm not sure if people remember the reason we didn't use 'scope' in
| the first place, but I thought there was some pretty good reasons,
| i.e., that 'scope' isn't the proper term to describe parameters that
| alter variants -- it's an entirely different function and 'scope' is
| being overloaded here.

This was considered very carefully when defining the TMDM and
the committee decided (long ago) that parameters are simply a
further set of topics that contribute to the total scope of
a variant. This is now explicit in XTM, ref. principle 3).


| > The roleSpec element has been replaced by instanceOf.=20
|=20
| Huh? This is considered a simplification or are we deliberately
| abusing the terminology? We need a specification of the role in
| an Association, not what the member is an instance of.
|=20
| > The member element has been replaced by role.=20
|=20
| We have an Association that has no members, only roles? This
| sounds like Associations are now being limited to not modeling
| or containing actual Topics, only classes of Topics.

Both these comments reflect an understanding of 'role' about
which there never was any consensus and which has now been
abandoned. The following diagram will (hopefully) explain the
terminology:

              roletype 1     assoctype     roletype2
                  |              |             |
                  |              |             |
   topic 1 ---- role 1 ---- association ---- role 2 ---- topic 2

e.g.

   tosca -------- X ------- composed-by ------ Y ------- puccini


X and Y can be thought of as "tosca qua composition by puccini"
and "puccini qua composer of tosca", respectively. They are the
*roles*. Each of these has a type (probably "work" and "composer",
respectively); these are the *role types*. There can be lots of
role that have the same role type. The role type is thus a class
(or type) of role, and roles are instances of their respective
types.

The confusion that was cleared up by the committee was that some
people used the terminology described above, while others used
the term "role" for "role type" (and, in PMTM4, "casting" for
"role").

We have now settled on role and role type, which more or less
parallels name / name type, occurrence / occurrence type, and
association / association type. (It doesn't quite parallel
topic / topic type, but there is still a close enough similarity
to justify using the same terminology.)

Since typing is handled throughout XTM by <instanceOf>, it was
deemed more appropriate than <roleSpec>.


| > A single topic reference is now required as the child of role.=20
|=20
| So we can no longer model Associations between groups of Topics,
| and have specifically decided to limit Associations to being
| bipartite (relations with only two members).

Not at all. There can be any number of role players in an
association, and there can be multiple roles of the same type.
However, there can only be a single role player per role. This
is a design decision taken in the TMDM that is now reflected in
the syntax. It doesn't affect the expressivity of XTM.


| > The baseName element has been replaced by topicName.=20
|=20
| Throwing away the entire baseName structure is a minor change?

It's not thrown away; it's just renamed. That *is* a minor
change. Perhaps it would be better so say that the GI
"baseName" has been renamed "topicName". (Actually, I would
like it to be renamed "name".)

=20
| > The instanceOf element is now allowed inside topicName.=20
|=20
| Creating an entirely new inner structure within Topic names
| is a minor change?

This is a slightly larger change that was decided on in
December 2002 in Baltimore and has been reflected in both
the model and the syntax ever since. It does not cause a
*new inner structure*, it simply adds one property to topic
names: the type. This is entirely backward compatible and
makes the model as a whole more consistent.


| > The variantName and subjectIdentity elements have been removed.=20
|=20
| A simplification that removes a container.

Yes. Actually two, *superfluous* containers.

=20
| > The variant element can no longer be nested.=20
|=20
| While some people might have found this odd, the hierarchy of
| variants did allow selection of a specific Topic name based on
| a specific set of accumulated variant parameters. If this
| features (which is decidedly complicated) is being removed,
| that's hardly a 0.x version change. That removes an entire
| feature of a language.

Again this was a decision taken long ago in connection with
the TMDM, which stated that nested variants are simply
flattened, with scoping topics (n=C3=A9 parameters) being inherited
as appropriate). The consequence of this is that the ability
to nest variants in XTM was merely syntactic sugar, and not
very sweet sugar at that (because it could not be reliably
reconstituted upon serialization).

The decision to remove nesting from XTM, does not remove an
entire feature. If it ever *was* a feature (and the committee
decided it wasn't), then the feature was removed by TMDM at
least two years ago, and this can hardly be regarded as a
last-minute change.

(However, my email that you didn't have time to read, does
advocate bumping the version to 2.0, so I do agree with you that
the *sum* of the changes we are making is more than a 0.x
version change.)


| > The instanceOf element is now required inside occurrence,=20
| > association, and role.
|=20
| So forget authoring. I often store an XTM document prior to
| it having all the <instanceOf> elements in place, as I often
| populate the Topic Map with content prior to adding in all
| the class information. This would preclude my ability to store
| my XTM documents as valid XTM.
|=20
| This kind of thing should be described in a higher-level schema,
| not at the syntax level.

The thinking here is that XTM is for *interchange*, not
authoring. There is no reason why a more relaxed version of
the schema could not be used during the authoring process.
But at interchange time it is important to know if stuff is
missing. If a relaxed XTM is no good for you, you always
have the option of setting the type of your occurrences,
associations, and roles to "untyped occurrence", etc. Not a
big problem.


| > The mergeMap element no longer supports added scope.=20
|=20
| Ugh. So now when I merge in another XTM document I have no
| ability to un-merge it or determine where a Topic comes from?
|=20
| This is a minor change?  Uh -- no, not for anyone who actually
| uses this feature.

Merging is considered to be an act of authoring, and when you
author, you can do whatever you like. There is no reason why
Ceryle shouldn't allow a user to specify scoping topics to be
added when one topic map is merged with another. But having
this ability in the <mergeMap> element does not seem right. In
fact, many people think that the <mergeMap> element itself does
not seem right. Merging is likely to be far more dynamic (and
under the control of the *recipient* of the information) than
an element type in an interchange syntax would make possible.


| > The id attribute has been removed from all elements except topic,
| > and the reifies attribute has been added on some elements.=20
|=20
| While some people may not see the *need* for ID on all elements,
| it never hurt to have it. The presence of an ID doesn't mean
| that any element must be reifiable, it means that the ID can
| be used by things like XSLT stylesheets and other syntax-level
| processes to canonically identify an XML element within the
| document. Removing that ability likely means that processors
| would need to rely on things like XPath to do certain kinds of
| processes, and given that XTM is a bag and not a sequence, this
| would likely mean that some processes would no longer be
| possible.
|=20
| This did no harm and should be reconsidered. If there are some
| concerned about reification of those elements, make a list of
| those that can be reified and include it in the prose.

Actually, it did do harm, as became clear once the data model
was defined and the issue of deserializing XTM was considered.
A deserialization specification has to say something about every
construct found in the syntax, so if you have IDs everywhere, you
have to say what happens to them when you process the XTM document
and build the model. We found ourselves saying on (almost) every
element: "The id attribute is ignored during deserialization."
This seemed pointless. Those who want IDs on everything can simply
extend the DTD, but they will have to strip out superfluous IDs
when interchanging via XTM.

=20
| > The itemIdentity, subjectLocator, and subjectIdentifier elements
| > have been added.=20
|=20
| No problem? Just a minor change?  You've made substantive changes
| to the whole way that identity and reification are being managed,
| and that's a point version change? No.

Again, no substantive changes, merely terminological changes.
Instead of <subjectIndicatorRef> we now say <subjectIdentifier>,
etc. Everything becomes more consistent, both internally and
with respect to the data model.

It really does seem as if you are far to hung up on syntax,
Murray, if I may say so. (No insult intended; just trying to
help you see why you are exaggerating the extent of the changes.)

=20
| > The subjectIndicatorRef and resourceRef elements have been removed.=20
|=20
| I wasn't aware of the reasons why but I see that we've decided to
| completely revamp the whole subject identity handling. One of the
| advantages of XTM over RDF was the ability to characterize the
| relation between the reference and the referenced entity. I see
| that the ISO committee no longer sees this as important? That
| harmonization with RDF was the reason?  Hard to understand, or at
| least hard to understand this as a point version change. Very
| profound.

Actually, they haven't been removed, so this is a bit misleading.
<subjectIndicatorRef> has been renamed, as noted above.

<resourceRef> is still there for occurrences and variant names,
but is no longer used for specifying identity: that's where
<subjectLocator> has taken over. But the changes are all in the
naming, not in the underlying model of identity.

Having said that, <subjectIndicatorRef> and <resourceRef> (or
their renamed "replacements") *have* been removed from certain
content models, where they were permitted (along with <topicRef>)
in XTM 1.0. However, this does *not* make XTM 2.0 any less
expressive; it just makes it easier to understand for the user.


| > XTM no longer uses XLink and XML Base.
|=20
| Cripe! Really??????? Well, you've just lost the linking model, and
| the ability to state the canonical location of an XTM document. In
| the former case you'll need to restate in its entirety the alink
| model from XLink, otherwise you've left linking completely arbitrary.
| For xml:base there is no substitute, and at least for all of my own
| work this alone would keep me from using XTM 1.1. I need to be able
| to canonically specify the base address of my documents so that
| they are portable, otherwise I've got to include a subject identity
| URI statement for every single <topic>. Ugh.

I don't feel qualified to comment on this one, except to say that
you of all people know how much FUD XML Base has caused in the past.
I personally don't believe that being able to state the canonical
location of an XTM document is important, since XTM is only for
interchange anyway, but I'm willing to be convinced otherwise. If
it *does* turn out to be important, there's got to be a better
solution that XML Base.

As to XLink: No-one, but no-one, who has implemented ISO 13250 has
used an XLink engine. (Does Ceryle?) In fact, does anyone really
use XLink for anything? All we needed were simple HTML-type links
and SGML-style IDREFs. The original inclusion of XLink was more
of a political thing, in any case, as far as I recall (and much
good did it do us :-)


| > The mergeMap element must now come before all topic and association
| > elements.=20
|=20
| There should be no ordering requirement of XTM documents. It's not
| a sequence, it's a bag. If applications need to process <mergeMap>
| elements first, they should pull them from the graph and process
| them first.

It's not XTM that's a "bag", it's the TMDM. XTM can be a sequence
without disrupting this. And why not: it is about *serialization*,
after all, and that's pretty sequential!

Anyway, the argument was made that processing very large XTM
documents could be made *much* more efficient if <mergeMap>s were
handled up front. It makes absolutely no difference for the author
of the XTM document, so we decided to humour the implementors
for once :-)


| > The datatype attribute has been added to resourceData, which also
| > now supports embedded markup.
|=20
| Both very substantial changes, both in terms of semantics and in
| terms of processing requirements. And *please* don't pull a W3C
| on me and say that you can "just ignore the markup you don't
| understand." (please)
|=20
| I think you're making an enormous mistake formally tying XTM to
| XML Schema.
|
| [long rant elided]

You may or may not be right in what you are saying. However, this
has nothing to do with XTM; it was introduced in the TMDM in May
2003 at the London meeting. It would have been of interest to hear
your arguments at that meeting, or as comments on the subsequent
drafts of TMDM. You might even have convinced me, but now it's too
late: Part 2 is under FDIS ballot and cannot be changed, so XTM
has no option but to support datatypes.


| > It's very difficult to argue that these changes make the new
| > version into a different language.
|=20
| Oh? I would say that precisely. These are substantive changes that
| at *very least* warrant a 2.0, but really, there are a lot of pretty
| fundamental changes. You've not just changed names, you've eliminated
| a lot of specialized semantics in favour of reusing existing ones
| that don't have the same meaning (e.g., <roleSpec> is by no definition
| the same as <instanceOf>). You've also invented a lot of new features,
| such as the reification and itemIdentity features. The whole subject
| identity machinery has changed. That's pretty fundamental.

I hope you now realise that you have exaggerated the extent of
the changes. They are only substantive on the surface, to people
who are more hung up on GIs than what the syntax allows you to
express in terms of the model, or who have misunderstood the
terminology (as in the role/role type vs. casting/role issue).

The *only* things that have been "invented", in fact, are datatypes
and the concept of typed names, and these are a natural extensions
of what was already there.

Otherwise, absolutely everything, without exception[*], is about
clarifying stuff that was unclear in the original XTM spec. (As
one of the original perpetrators of that lack of clarity, I feel
more than qualified to say this :-)

---

At this point, I feel overcome by an acute lack of insomnia and
choose to skip the rest of your posting. I've spent a lot of time
on this reply already, in the honest hope that it will help you
get back up to speed on developments since TopicMaps.Org. Your
fears truly are misplaced: Everyone in the committee stands behind
the new TMDM and the new XTM. Old conflicts are a thing of the past
in every sense. We are moving forward together, and it would be
nice if you were to join us. We are all on the same side, really,
you know :-)

Amen, and good night,

Steve

[*] This claim is off the top of my head at 4:30 AM, so there
may be minor exceptions :-)

--
Steve Pepper <pepper@ontopia.net>
Chief Strategy Officer, Ontopia
Convenor, ISO/IEC JTC 1/SC 34/WG 3
Editor, XTM (XML Topic Maps 1.0)
=20