[sc34wg3] Review of N0393

25 Apr 2003 11:39:44 +0200

In preparation for the London meeting I have spent some time studying
N0393 in order to get a clearer picture of what it is and does. 

My first impressions are:

 - This is not topic maps as described in ISO 13250, but something
   new.

 - It is not at all clear what purpose this document serves, nor what
   benefits it provides to anyone.

 - This document, and the "technology" it describes, are both very far
   away from being ready to go.

Steve Pepper has already covered the first, so I will focus on the
last two points here.

As far as I can see N0393 is a set of guidelines or best principles
that have been dressed up to look as if they are a technology. The
result is that there is an immense amount of machinery that obscures
the underlying ideas, and quite likely it will be very difficult to
persuade anyone to conform to the (in essence if not in form) quite
reasonable advice of N0393.

The reason I don't call N0393 a technology is that when you have
created a TMA you have written a document consisting of prose. So
N0393 is a set of guidelines for how best to write such documents, and
offers a set of subjective guidelines for how to do that, as well as a
formalism, and the value of that formalism as well as its
effectiveness is very much in doubt.

SRN advanced the idea on the mailing list that we could create an XML
vocabulary for writing such documents. That vocabulary would either
have to be rather like the XMLSpec DTD (the one used for the XML
Recommendation, and SAM), or an immense amount of work, much greater
than that needed to produce N0393, would have to be done to create the
machinery that would allow the expression of the rules governing a TMA
in machine-executable form.

So while N0393 might in the future become a technology it is not one
today. And this makes me wonder what its use is. I don't think it is
very effective as a set of guidelines for writing specifications in
its current form. I know that the thing is not implementable as it
stands, and that it is very far away from being implementable. So
where is the value? What can it do that is useful?

Now, as to why the thing is not implementable, and why it is not a
technology, I will just scratch a little in the surface of it. There
are many many more problems with the document than those I point to
below, but they should at least demonstrate that the problems are
serious. It seems to me that the flaws lie at the very heart of the
idea of what N0393 tries to do, but I hope I am wrong.

It has been suggested that the Reference Model and the Standard
Application Model should be merged into a single model, but it seems
clear to me that that is not feasible. The SAM provides strict
implementation rules with a thin layer of semantics on top and no
guidelines, while the RM provides guidelines with a thin layer of
semantics underneath and no implementation rules at all. (At least not
any that are workable.) The models do completely different things, and
in completely different ways, so if we are to have both it makes sense
to keep them separate.

Some initial points:

 - N0393 describes a particular model, but does not make it clear
   whether, or how, that model enables any form of interoperability.
   Section 6 defines constraints on "Topic Map Applications", but it
   is not clear what these are. N0393 says they are not software, but
   are they documents? Are they file(s) in some defined syntax? Are
   they merely abstract concepts?

 - It is not clear that the notion of the conformance of a Syntax
   Deserialization Definition is useful. What does it mean for an SDD
   to conform to this document? What value does it add for SDDs to
   conform? What is to be done about SDDs that do not conform?

 - In general, the conformance requirements of N0393 are very weak in
   the sense that they are for the most part subjective judgments
   formed by a human being, rather than something that is verifiable
   by software. It is not clear that this is appropriate.

 - N0393 defines an information modelling formalism consisting of
   topics, properties, and assertions, and then goes on to define a
   model in terms of those primitives. However, the underlying
   formalism is insufficiently defined, and the model built with it
   appears to be far more complex than it has any useful need to be.

Some formalities:

 - N0393 defines some of the same terms as the SAM, and in some of the
   cases the definitions are nearly equivalent, whereas in other cases
   they are clearly in conflict. This needs to be resolved.

 - N0393 refers to itself as an 'International Standard', which it is
   not. According to ISO directives, the appropriate form of
   self-reference is 'Technical Specification'.

 - N0393 claims to be 'the information structure of all topic maps'
   yet its relationship to ISO 13250:2002 is not clearly defined.

 - The glossary should be reduced.

 - N0393 uses the terms "must", "may", and "should" without defining
   them. Do constraints expressed as "may" or "should" affect
   conformance? 

--- Introduction

It is claimed that N0393 "enables Topic Map Applications to be
expressed as topic maps", yet no vocabulary is defined for expressing
the components of a TMA as an instance of the N0393 model. How does
one express the "TM Application Name" in terms of N0393? The same
applies to the other parts of the model.

It is claimed that N0393 "enables the conformance of Topic Map
Applications to this International Standard [sic] to be verified", yet
it does not say *how* such verification is done. It would seem that
verification is done by reading a TMA definition document manually and
comparing it against the guidelines in N0393.

It is claimed that N0393 "enables determination of whether two topic
maps are identical". Where is the definition of the procedure that is
used to do this determination?

--- Glossary

As noted above there are several conflicts with the terminology of the
SAM. These should be worked out.

The concept of "treated as a set" is baffling. Why not just say "a
set"? Why require a particular ordering? And is that indeed what the
sentence "Apply the same comprehensively deterministic
order-normalization algorithm to both lists" is intended to convey?

The first paragraph of the definition is also quite surprising:

  "Regarded (as a list) in such a way as to ignore both the order of
  the items in the list, and any duplicates that appear in the list."

The subject is missing from this sentence, and to "regard a list as a
list in such a way as to ignore the order of the list" (paraphrase) is
a very strange thing to do.

I believe this whole concept is deeply flawed and that it should be
replaced by the standard concept of a set.

--- Subjects, topics, and properties

In general, this section is severely underspecified, and a number of
the design choices here seem suboptimal. Since this is the core of the
whole model it is clear that N0393 is a castle built on sand.

3.2: If topics are merged which have properties that can only have a
     single value, what happens if the values are not equal?

     How is equality of values determined?

3.2.1: is it property classes or instances which have names?

3.2.2: what is the difference between an empty value and a null value?

3.2.2: *how* do the definitions of property classes specify the types
       of their instances? What are the allowed types? What is a list?
       What may lists contain? If the type of a property is simple,
       what does that mean? What are the different kinds of simple
       property types?

       May the value of a property be a topic? Or a list of topics? Or
       a list of lists of topics?

       One would assume that part of the definition of a property
       class whose value is a "list treated as a set" must be the
       ordering algorithm to be applied to order the contents of the
       list, yet this is not stated.

3.2.3: Why must a topic have at least one SIDP instance? RDF could not
       conform to this, for example, nor could all topic maps.

       Why cannot a topic have more than one SIDP instance per TMA?
       SAM defines three SIDPs, for example.

3.2.4: Does this mean that the end user cannot decide to apply merging
       rules beyond those defined by N0393? If so, why? This seems an
       unacceptable limitation on the freedom of end users. (6.1.6
       appears to repeat this.) It would also seem to directly
       contradict ISO 13250.

--- Relationships and assertions

How are assertions represented? Are they built from topics with
properties, or are they a separate structure? Can assertions influence
merging? 

4.1.4: Why cannot two castings cast role players in the same role?  If
       the recommended solution is to define sets, why does not N0393
       provide a mechanism to express such sets? And what is a group?

Section 4.2.1 and onwards is utterly impenetrable, and also defines a
very complicated machinery. It seems obvious that this machinery needs
to be trimmed. 

--- Situations and Property Values

What purpose does this section and the terms within it serve? It would
seem that it could be taken out entirely with no loss to the document
as a whole.

--- TM Application Definitions

What *is* a "TM Application Definition"? What is its purpose? What
form does it take?

What is the purpose of requiring "TM Application Names" to be unique?
Does a TMA conform to N0393 if its name is not unique but that was
unknown to its creators at the time of creation? Or is it the time of
publication that is critical? What are the criteria for establishing
whether or not the creators were aware of another TMA with the same
name?

Are there syntax rules for "TM Application Names"? Are there syntax
rules for the names of property classes?

What *is* the name of an assertion type, and how is it connected to
the assertion type? One presumes that "assertion type" means "the
topic that is the t-topic in an assertion".

6.2: Is '::' allowed within "TM Application Names" or the names of
     property classes/assertion types/etc?

     The requirement that a "TM Application Name" cannot contain a "."
     is in conflict with the recommendation in note 14 that URIs can
     be used as unique TMA names.

     Is the second paragraph of this section actually meaningful? If
     so, what does it mean to represent a relationship by means other
     than assertions? Would representing relationships through
     properties constitute a violation of this requirement?

6.3: If TMAs are included "by reference to their definitions", why
     need their names be unique? And what is an acceptable reference?

--- Requirements for Syntax Deserialization Definitions

What is the purpose of this section? Why cannot TMAs themselves define
what rules, if any, there are for their syntaxes? (Indeed, why
*should* the TMAs define any rules for their syntaxes rather than
leave it to the syntaxes to define the rules?) Are syntaxes always
specific to one or more TMAs, or may there be syntaxes for the model
defined by N0393 in general? If so, must their definitions be
conforming SDDs?

--- Requirements for Implementations

What is the purpose of this section? Is it acceptable for an
implementation to claim conformance to *all* TMAs? If not, why cannot
the definition of conforming implementations be left to the TMAs?

How does one establish whether or not an implementation behaves "in
any way that would lead one to think" that it allows the reification
of something that it does not allow one to reify?

--- Fully Merged Topic Maps

Why is this section not part of the general definition of the model?

--- Conformance

Why is this section not simply a container for sections 6, 7, and 8?
At the very least one would expect that there should be references to
where the requirements for conformance are.

9.1: Why is this section included?

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >