[sc34wg3] a new name for the Reference Model

02 Jan 2003 21:52:07 -0600

Jim Mason asks:

    > Do the names we give to standards say anything
    > about what conforms to them?

    If the answer is "Yes", then I can say what I'd
    like to be said:

    (1) The RM governs the definitions of Topic Map(s?)
        Models whose specifications claim conformance
        to the ISO Topic Maps paradigm.  The RM imposes
        requirements on all TM Model definitions that
        claim conformance to it.

    (2) Topic Map(s?) Models, including but not limited
        to the Standard Model, govern the topic map
        documents and enabling software that claim
        conformance to them.  When a Topic Map(s?)
        Model inherits ("borrows", includes) another
        Topic Map(s?) Model, the inherited Model *also*
        governs the documents and software that claims
        conformance to the inheriting Model.

    How to say or imply all that in the names is a good
    question.  I think we're already much closer than
    we've ever been.

    > What conforms to what is still labelled the RM?

    Definitions of TM Models.

    > What does conformance to [the RM] mean?

    For the definition of any given TM Model, it means
    that the RM's shopping list of required
    definitions, and aspects of each definition, is
    fully satisfied.

    > Likewise for what's still called SAM? Where does
    > conformance to one of those say about conformance
    > to what's currently out there as ISO/IEC 13250?

    OK, here's what I think.

    13250 is about two syntaxes for interchanging topic
    maps: XTM and HyTM.

    XTM and HyTM are both inside and outside the SAM,
    and that's why both the SAM and the RM are needed
    in order to fully and rigorously describe what's
    happening in 13250.  (Before anybody gets angry,
    hear me out.  I don't think this formulation
    threatens anybody or anything.)

    XTM and HyTM are inside the SAM because both
    syntaxes invoke SAM-defined semantics, including:

    * occurrences
    * names
    * instance of class
    * subject indicator
    * addressable subject
    * set
    * scope
    * etc.

    The SAM can define merging rules for all of the
    subjects that are yielded by the above semantics,
    but it can't define merging rules for subjects that
    may be yielded by user-defined association types,
    since we can't know what their semantics will be.

    Now, since both XTM and HyTM allow users to define
    their own association types, the question arises:

      What does it mean when a subject is specified by
      means of an association, and how does anyone
      other than ISO standards-makers define the
      merging rules for the subjects that may be
      conferred upon role players by associations?

    I bet some readers are saying, "What in the hell
    is Newcomb blathering about here?  Since when are
    subjects conferred upon role players by
    associations?"  Well, to make a long story short,
    everything boils down to relationships between
    subjects, and some relationships confer subjects on
    the topics that play certain roles.  

    For example, there is the kind of relationship that
    exists between a topic and its subject indicator.
    One of the roles in such a relationship is played
    by a subject indicator, which is always a piece of
    information.  The other role is played by a topic.
    The nature of this kind of relationship is such
    that the *existence* of the relationship causes the
    subject that is the meaning of the piece of
    information (that plays the subject indicator role
    type) to be conferred upon the topic (that plays
    the other role).  That's an example of how a
    relationship actually specifies the subject of a
    topic.

    Now, let's look at the XTM syntax.  XTM syntax is
    designed to make some relationships extremely easy
    and intuitive to specify.  In fact, the XTM syntax
    doesn't even make them look like relationships.
    For example, instances of the <subjectIdentity>
    element type establish the same kind of
    relationship I've been talking about, between
    topics and their subject indicators.  And the
    purpose of a <subjectIdentity> element is to confer
    a subject upon the <topic> that contains it.  It
    does this by specifying that there is a
    relationship between the topic and a subject
    indicator.

    If you're still with me, here, you're ready for the
    punch line: XTM *also* allows topic map authors to
    specify *arbitrary* kinds of relationships, by
    means of <association>s.  This is where the RM
    comes into 13250 (i.e., into HyTM and XTM).  13250
    not only has *inherent* kinds of relationships (the
    significance of each of which is described in the
    SAM), but also it allows and encourages
    *user-defined* relationship semantics.  In other
    words, HyTM/XTM has always expected that the SAM's
    relationship semantics would be extended by users
    of HyTM/XTM.

    What if some of those user-defined relationship
    types are supposed to confer subjects on some of
    their role players?  How do we tell when such
    subjects are the same, and therefore must be
    merged?  Neither 13250 nor the current SAM faces up
    to the possibility:

    * that a user could define an association type
      whose instances determine the subjects of one or
      more their role players, and

    * that more than one topic may thus have conferred
      upon it the same subject, and

    * that therefore such topics need to be merged.

    We should decide what we want to do about this.  I
    think there are several choices, including:

    (1) We do nothing.  We don't say anything about it.
        We pretend the issue doesn't exist, and we face
        it at some later date.  (I think this is the
        worst possible choice.  It weakens both the SAM
        and our credibility.  It creates a situation in
        which weeds will thrive.)

    (2) We say that user-defined assertion types in
        XTM/HyTM *are not* allowed to have any
        semantics such that the subjects of any of
        their role players are specified by instances
        of such user-defined assertion types.  

        If we choose this option, whenever a topic map
        author wants topics to be merged, he must say
        so explicitly and redundantly, either with a
        <topicRef> or with two <subjectIndicatorRef>s
        to the same subject indicator.  Michel likes
        this idea.  It protects developers of XTM/HyTM
        processors from ever having to support
        user-extensible merging rules, and it may have
        other advantages of which I am not yet aware.
        I dislike it because I think it should be
        enough to say, in domain-specific terms (i.e.,
        via instances of user-defined association
        types), that topic A has subject S1, and topic
        B has subject S1, and expect that A will merge
        with B simply because they both have the same
        subject.  It shouldn't *also* be necessary
        to say explicitly:

          (i) that topic A also has subject indicator
              SI1, and topic B also has subject
              indicator SI1, or

         (ii) that topic A has the same subject as
              topic B.

        If the SAM imposes a requirement to supply such
        redundant information in each topic map, we
        will eventually have to answer the following
        embarrassing question, emanating from possibly
        irate users: "Why does XTM/HyTM allow
        user-defined association types at all, since
        they aren't allowed to mean anything in terms
        of subject recognition?"

    (3) We say that user-defined assertion types in
        XTM/HyTM *are* allowed to have semantics such
        that the subjects of their role players are
        specified by instances of such user-defined
        assertion types.  We require that, when such
        topic maps are interchanged, they must include
        the information necessary to allow such
        subjects to be merged automatically, in the
        normal course of topic map processing, whenever
        such subjects are identical.

        Personally, I strongly favor this third choice.
        It doesn't require us to delay the
        standardization of the SAM, even though we may
        wish to add stuff to the SAM, at some future
        date, that *standardizes the expression* of the
        additional TM Modeling information necessary to
        extend the merging rules of XTM/HyTM in support
        of domain-specific subjects.  It leaves
        XTM/HyTM's future indefinitely long and
        indefinitely bright -- as long and bright as
        the full breadth and depth of the TM paradigm
        itself.  It won't force XTM users to learn how
        to make TM Models; it only leaves open the
        possibility that they can use such knowledge if
        they want to, without first having to abandon
        XTM.

    In any case, we really need to face this issue.  

    > Does [the RM] specify or interpret?

    It specifies.  If it only interprets, then its
    constraints are optional, and we abandon the idea
    that "Topic Maps" means reliable, predictable,
    ontology-neutral knowledge aggregation.

    > And this leads me back to a discussion in
    > Baltimore about whether we need a multipart
    > standard or multiple standards. If we have a
    > multipart standard, I think it's easier to
    > justify the RM as the much-needed explanation of
    > what the current 13250 means, whether it turns
    > out to be standard-like or TR-like.

    I don't see the RM as fully answering the question,
    "What does 13250 mean?"  It provides an essential
    part the answer, but it cannot provide the whole
    answer.  The same is true of the SAM.  I see the RM
    and SAM together as fully answering the question,
    "What does 13250 mean?".

    The RM and SAM are both deeply technical.  The
    answer to the question, "What does 13250 mean?" is
    not light reading.  If if we pack both the SAM and
    the RM into 13250, along with HyTM and XTM, we'll
    have a big, heavy standard that nobody will read.
    (I hesitate to mention three examples of too-heavy
    standards: HyTime, STEP, and XML Schemas.  All
    three are marketing disasters, despite whatever
    technical virtues they may or may not have.  The
    primary problem with all of them is that they are
    TOO BIG.)

    We need Topic Maps to be *perceived* as light,
    easy, and intuitive.  The XTM DTD gives this
    impression.  Great!  

    So, if we go the multipart route, I want to be
    very, very certain that the default, leading ISO
    publication on Topic Maps is very short and sweet,
    has the XTM DTD in it, and damn little else.  It
    should be a "README" for Topic Maps.  

    I think we will defeat ourselves if we direct
    public attention toward "ISO 13250", and people
    look at it only to find that it's 100 pages of
    mostly unintelligible techno-gibberish.  Even if
    the first part of it is short, sweet, and
    easy to understand, 100 pages is a big turn-off.
    (I'm reminded of the guy who asks for a drink of
    water, and then receives his drink in the form of a
    blast from a firehose.)

    > If it's going to be a separate standard, then we
    > have to make it a standard and be clear about
    > what conforms to it. (If it's an abstract model
    > of data or data aggregation or something like
    > that and all that really conforms to it is 13250
    > itself, then it shouldn't be a separate standard
    > or even a separate document but rather a part of
    > a revised 13250.)

    Any TM Model can conform to the RM, not just the
    SAM, and not just XTM or HyTM.  13250 is not the
    only thing that will ever conform to the RM, or to
    the SAM, even.

    13250 is a standard for a pair of syntaxes.  Both
    of these syntaxes are suitable for interchanging
    Topic Maps that conform to the SAM, which is a TM
    Model set forth in a separate standard.  Both of
    these syntaxes inherently provide for the
    expression of semantics (user-defined <association>
    types) that are outside the scope of the SAM.  When
    such non-SAM relationship semantics are used, they
    must be defined in conformance with the RM, which
    is *another* separate standard.

    > Another way of asking the question about what
    > sorts of documents the RM and SAM are is to ask
    > who the audience is. Are we writing these things
    > for code writers or end users? That is to say,
    > are we specifying the actions of a TM engine and
    > the interchange formats for TM data, or are we
    > marketing TMs/trying to help people who want to
    > create TMs? At the moment, those two audiences
    > are sometimes almost the same (that is to say,
    > ourselves), but what about 5 years down the road?
    > We need the information in both the RM and the
    > SAM, but we need to understand how we need it.

    Here's my take on audiences:

    13250: It's for everybody, but it's especially
           appealing to XML/SGML people.  (More and
           more, that's everybody in any
           knowledge-intensive field.)

    SAM: Audience is knowledge managers and software
           developers: techies.

    RM: It's for Knowledge managers and software
           developers: techies who want to achieve
           subject location uniqueness for subjects
           that are specified by domain-specific
           relationship types.

-- 

-- Steve

Steven R. Newcomb, Consultant
srn@coolheads.com

Coolheads Consulting
http://www.coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

1527 Northaven Drive
Allen, Texas 75002-1648 USA