[sc34wg3] revised draft Reference Model document N0298

Steven R. Newcomb sc34wg3@isotopicmaps.org
16 Apr 2002 11:27:53 -0500


"Martin Bryan" <mtbryan@sgml.u-net.com> writes:

> > Do I understand you to be discussing a third kind of
> > thing, which you call "concept"?  Or is "concept" for
> > you the same thing as "subject"?

> ...Concepts should be "definitions that allow one to
> determine whether or not the labels of a topic can
> legitimately be applied to a particular object."

I'm not completely understanding you yet.  If a concept
is a definition, it is, in dRM and XTM terms, a
"subject indicator".  But it's a particular kind of
subject indicator: a good and useful subject indicator
(as opposed to a bad and useless subject indicator).
It's the kind that, in your words, "allows one to
determine whether or not the labels of a topic can
legitimately be applied to a particular object."

Now, if that's right, and if your patience with me has
not been completely exhausted, all you need to do is to
explain what you mean by "labels of a topic", "object",
and "legitimately", and maybe I'll understand you!

* Are the "labels of a topic" its names?  Are they the
  senses of other kinds of assertions in which it plays
  roles?

* Is an "object" an addressable subject?  A
  nonaddressable subject?  A node?  An arc?  An
  assertion?  An XML element?

> > A *relationship type* whose instances are relationships
> > between published things, on the one hand, and
> > publishers, on the other, is absolutely not the same
> > thing as the *role* played by publishers in such
> > relationships, even if we describe or name both of them
> > using the same words.

> I agree, but this was not the distiction I was
> making. 

> The distinction was between the internal
> "relationship type" that exists because there is an
> association between an ISO13250 topic element that
> identifies a particular edition of a particular
> publication and an ISO13250 topic element that
> identifies a particular publishing organization
> (e.g. the relationship between the ISO13250 standard
> itself and ISO and/or IEC) and that between the
> "role" of an information occurrence within a
> particular ISO13250 topic element and an externally
> identified organization that happened to have the
> same relationship as the association, but which
> cannot be expressed as such without creating a new
> ISO13250 topic element which, for some reason, the
> author of the map does not wish to do.

I'm having trouble parsing the above huge sentence.
Either there's an error in it, or there's some sort of
bug in my brain's English parser.

> The role played by both of these identifiers is
> identical, but the way the map is created differs
> because there is a fundamental difference between
> internal references and external ones within
> ISO13250.

I'm still lost.

*******************************************************

I begin to suspect that, throughout this confusing
conversation, you've been saying "role" and thinking
"occurrence role", while I've been seeing "role" and
thinking "role type".

We need some diagrams.  They're coming.

*******************************************************

> > You seem to be arguing that
> >
> >   the act of translating a topic map document into a
> >   graph of nodes in which there is exactly one node per
> >   subject
> >
> > should be nondeterministic.

> Not my argument at all. I am trying to ascertain
> whether, if it is necessary for systems to create
> more than one node in a dRM to represent an ISO13250
> topic element, it is permissible to create nodes of
> different types to represent the same ISO13250 topic
> element. Your rules seem to suggest to me that this
> is not permitted.

I think we'd better start with an example of a HyTM
<topic> and identify all the subjects in it.  Having
done that, we can decide what "types" they are.  There
is more than one taxonomy of topic types that can be
brought to bear, here, so it's not very helpful to say
that they are (or are not) of different types.

<person 
 HyTM=topic 
 id=abeLincoln 
 identity=abeLincolnSI>
  <topname>
    <basename scope="foo">Abraham Lincoln</basename>
  </topname>
  <biography
   scope=bar
   type=unauthorizedBiography>
    <idloc>abeBio</idloc>
  </biography>
</person>

The above is a HyTM <topic> element (HyTM=topic) whose
generic identifier indicates its topic type (person).
Its subject indicator is the element whose unique
identifier is "abeLincolnSI".  It has a basename
"Abraham Lincoln" in the scope of "foo".  It has an
occurrence which is the element whose unique identifier
is "abeBio", in the scope of "bar".  (Please take my
word for all this; some of the HyTime incantations that
make all these things true are not shown above.)

Here is a list of dRM-level subjects that are directly
demanded by the above HyTM <topic> element:

 #0. A specific human individual commonly known as
     "Abraham Lincoln".  This is the subject of the
     <topic> element.

 #1. The topic type of human individuals.

 #2. The name "person".

 #3. The assertion that #2 is a name of #1.

 #4. The addressable subject that is the subject
     indicator for the subject of the <topic> element.
     It is the piece of information referenced by the
     string, "abeLincolnSI".

 #5. The assertion that #3 is a subject indicator of
     #0.

 #6. The name "Abraham Lincoln".  (Note: this is *not*
     the string "Abraham Lincoln" as it appears in the
     above example; if it were, it would be an
     addressable subject.  Subject #6 is
     nonaddressable; it is the name "Abraham Lincoln"
     in the abstract sense; it is the name that is
     meant by *any* copy of that string, anywhere.  In
     this list of subjects, I'm not bothering to use
     the appearances of names in the example as subject
     indicators for those names.  There is no need to
     do that if they already have subject indicators
     located elsewhere, and I'm assuming that they do.)

 #7. The assertion that #6 is a name of #0.

 #8. The subject of the topic that is referenced by the
     string "foo".

 #9. The set of subjects that has #8 as its only
     member.

#10. The assertion that #8 is a member of #9.

#11. The assertion that #9 is the scope of #7.

#12. The occurrence type of unauthorized biography.
     (In dRM terms, this is an assertion type.)

#13. The name "biography".

#14. The assertion that #13 is a name of #12.

#15. The name "unauthorizedBiography".

#16. The assertion that #15 is a name of #12.

#17. The addressable subject that is the occurrence of
     an unauthorized biography.  It is the piece of
     information referenced by the string, "abeBio".

#18. The assertion that #17 is an unauthorized
     biography occurrence of $0.

#19. The subject of the topic that is referenced by the
     string "bar".

#20. The set of subjects that has #19 as its only
     member.

#21. The assertion that #19 is a member of #20.

#22. The assertion that #20 is the scope of #18.

#23. The assertion that #0 is an instance of #1.
     (Sorry, this one should have appeared earlier in
     this list.)

*******************************************************

Now, there are several typologies (taxonomies) of topics
that we can bring to bear on the above list.  Here
are four of them:


(1) Addressable vs. nonaddressable subjects:

    (a) Addressable subjects: 4, 17

    (b) Nonaddressable subjects: 0, 1, 2, 3, 5, 6, 7,
                                 9, 10, 11, 12, 13, 14,
                                 15, 16, 18, 20, 21,
                                 22, 23

    (c) Can't tell from the example: 8, 19



(2) Node types in dRM terms:

    (a) A-nodes: 3, 5, 7, 10, 11, 14, 16, 18, 21, 22,
                 23

    (b) C-nodes: (I didn't bother to list these.  Since
                 all the assertion types used in the
                 example have two roles, there are two
                 of them per A-node)

    (c) P-nodes: 12 (I didn't list the ones for the
                    naming and scoping assertion types.)

    (d) R-nodes: (I didn't list these.)

    (e) none of the above: 0, 1, 2, 4, 6, 8, 9, 13, 15,
                           17, 19, 20



(3) User-defined/user-definable types:

    (a) #0 is an instance of #1.

    (b) #18 is an instance of #12.



(4) Things that we know about the types of things
    because of the roles they play in certain assertion
    types:

    (a) Names: 2, 6, 13, 15

    (b) Scopes: 9, 20

    (c) Occurrences: 17

*******************************************************

[snip]

> What I mean by "well-maintained topic maps" is one
> that is not overloaded with unnecessary information
> that the user has to navigate through to get to the
> information he or she needs. As the map's author I
> should not have to create a navigable node for every
> company that ever submitted an IT standard to the
> world. I should be able to create a "relevant subset
> of standards producing bodies" from those nodes that
> are of direct relevance to my user community. Users
> should only see the set of nodes whose scope or role
> meets their requirements profile. This is what I am
> looking to achieve using dRM or SAM.

So am I.  One difference between us, evidently, is that
you want to eliminate information from the topic map,
whereas I want to hide all unwanted information at
rendition time.  The latter approach seems much more
powerful to me.  I don't like the idea of crippling the
map itself.  I prefer to keep rendition issues separate
from source maintenance issues.

-- Steve

Steven R. Newcomb, Consultant
srn@coolheads.com

Coolheads Consulting
http://www.coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

1527 Northaven Drive
Allen, Texas 75002-1648 USA