[sc34wg3] draft Reference Model

Steven R. Newcomb sc34wg3@isotopicmaps.org
07 Apr 2002 04:05:54 -0500


"H. Holger Rath" <holger.rath@empolis.com> writes:

> "Steven R. Newcomb" wrote:
> > ...snip...
> >
> > > Some comments:
> > >
> > > -     You use the term "arc" but you do not explain
> > >       (at least I have not found it) if the connection
> > >       between the two nodes (the arc) has a direction
> > >       (as an RDF triple has) or not.
> > 
> > No direction.  Or, rather, both directions at once.
> 
> Good.

I'd like to withdraw my remark, above: "Or, rather,
both directions at once."  I think it's much better to
say that the Reference Model's arcs are
"nondirectional".  They are purely declarative.  These
arcs do not provide traversal services, and it would be
misleading for us to imply that they do, by saying that
they are "bidirectional".

> > >       I assume it does not have a direction. If this
> > >       is the case I would name it "edge" and not "arc"
> > >       because (at IMHO) arc implies a direction.
> > 
> > I'd sure like to know who we're going to confuse or
> > alienate by using the term "arc", versus who we're
> > going to confuse or alienate by using "edge".  I have
> > seldom seen the term "edge" used except by
> > mathematicians.  "Arc" is a word that is vaguely
> > understood by most people who have taken first-year
> > computer science courses, and by everyone who has a
> > passing familiarity with RDF.
> 
> And that's the reason why I am concerned. If people
> hear "arc" and they know RDF arcs they might think the
> TM RM arcs are directed as RDF arcs are.

We've done a wee bit of checking.  "Edge" is more
formal, and you're evidently correct that it does not
imply directionality.  In common usage, however,
"arcs", too, can be nondirectional.  

As a marketing principle, it's always better to make
people think they understand something we're trying to
sell them, even if they don't really understand it
immediately.  That's the main reason I think we should
continue to use "arc".  If people assume that the
Reference Model's arcs are directional, and later find
out that their assumption was incorrect, there's no
harm done, and they've learned something important
about Topic Maps.  If, on the other hand, people see
"edge", they may well find the Reference Model much
more forbidding than they otherwise would, and they
won't bother to learn about Topic Maps.

(BTW, we also found that it's common for the term
"arc" to be used in a way that implies that every arc
has hierarchical significance -- i.e., that the graph
is always a tree.  This isn't the case for RDF, of
course, and it won't be true for Topic Maps, either.)

> > > -     I don't find any information where in the RM
> > >       some literals (= value) are stored. The graphics
> > >       show some bubbles with names in them but it is
> > >       not explained in the text what this is and where
> > >       is goes.
> > 
> > The draft Reference Model regards literals as subject
> > constituters and subject indicators.  It does not treat
> > literals in any exceptional way.

> OK. But when I understand your merging rules (see
> below) right this automatically forces merging of
> every topic that has the same 'literal'.

No.  But if you're confused about what I'm trying to
say about this, you can take comfort in the fact that
you're not alone.  It is difficult to explain this.  I
just tried to explain it in a draft of this note in
writing, and Michel couldn't understand what I wrote,
even though he already understood the point!  Ugh!
Here is my second attempt.

Please understand that what I'm about to say has
absolutely nothing to do with the name-based merging
rule.  If, while reading what follows, you think that
I'm talking about the name-based merging rule, you are
not understanding what I'm trying to say.

In the following explanation, the reason why I'm
talking about names is precisely because names are the
most confusing case.  When you understand how names are
treated (and they are treated exactly like anything
else), everything else is easy.

The first thing I want to do is to introduce and
carefully distinguish between the following distinct
subjects (Subjects #1a, #1b, #2a, #2b, and #3):

Consider a string that appears in a certain
location, for example:

     <p name="p1">The Statue of Liberty, a huge statue
     of a sturdy-looking woman who is holding a torch
     high, located on Bedloe Island in New York
     harbor.</p>

Let's assume that the above example appears in a
specific document located at a specific Web address.

As we all know, there are two subjects for which the
paragraph whose unique identifier is "p1" can serve as
a binding point:

Subject #1a: The paragraph p1 itself can be a subject
             constituter (an "addressable subject").

Subject #1b: The paragraph p1 can be regarded as a
             subject indicator, probably for the topic
             whose subject is the Statue of Liberty (a
             "nonaddressable subject").

That was a warmup.  Now let's consider a string that is
just like the string in Subjects #1a and #1b, but this
string happens to be a name, e.g.
    
     <baseNameString id="bn1">Abraham Lincoln</baseNameString>

Again, let's say that the above example appears in a
specific document located at a specific Web address.

Again, as we all know, there are two subjects for which
the name string whose unique identifier is "bn1" can
serve as a binding point:

Subject #2a: The content of the <baseNameString> itself
             can be a subject constituter ("addressable
             subject").

Subject #2b: The content of the <baseNameString> can be
             a subject indicator for the name "Abraham
             Lincoln", considered abstractly and
             separately from any instance of the string
             "Abraham Lincoln", and separately from any
             character set or other encoding of the
             name "Abraham Lincoln."

Subject #3: The historical figure commonly known as
            "Abraham Lincoln."  This is *not* the same
            subject as Subject #2b.  Subject #2b is a
            name.  Subject #3 is a deceased human
            individual.


Now we can talk about merging literals.

Now let's imagine that we have an XTM document in which
the following elements appear:

  <baseNameString id="foo">zorp</baseNameString>

  <baseNameString id="bar">zorp</baseNameString>

Elsewhere in this hypothetical XTM document we have
four topics, which we'll call Topic A, Topic B, Topic
C, and Topic D.  (This hypothetical XTM document
happens to be an extremely weird topic map.  The
subjects of Topics A, B, C, and D are aspects of the
same topic map.  Please bear with me; my weird example
has pedagogical purposes.)

  Topic A's subject is the *addressable subject* which
  is the string found in the content of the element
  whose unique identifier is "foo".

  Topic B's subject is the *addressable subject* which
  is the string found in the content of the element
  whose unique identifier is "bar".  

The subjects of Topic A and Topic B are both like
Subject #2a.  An addressable subject is always some
literal data *at some specific and unique location*,
and the locations of the subjects of topics A and B are
different.  Therefore, they are different subjects.
Topics A and B can *never* merge, even though their
data are identical.

  Topic C's subject is the *nonaddressable subject*
  which is *indicated by* the string found in the
  content of the element whose unique identifier is
  "foo".

  Topic D's subject is the *nonaddressable subject*
  which is *indicated by* the string found in the
  content of the element whose unique identifier is
  "bar".

The subjects of Topics C and D are both like Subject
#2b.  We know that "zorp" is a name because it's in the
content of a <baseNameString> element, and we know that
<baseNameString> elements contain names because we
happen to understand the standard XTM syntax of the
Standard Application.  (There are lots of other
contexts in which names appear, i.e., in which we know
that strings are names because we understand their
contexts.  In this example, I used <baseNameString> --
and a very weird topic map -- in order to provide a
context for the string "zorp" that we would all
understand as establishing that "zorp" is a name.)

In the Standard Application, topics C and D would
*always* merge, because the string found in the content
of each of the elements is known to be a name, and the
name that is being *indicated*, in both cases, is
"zorp".  The name "zorp" is an abstract subject that
may be indicated by any number of literals (addressable
subjects) that consist of the distinct sequence of
characters z, o, r, and p.

(Again, lest there be any confusion about this: The
above examples have nothing to do with the name-based
merging rule.  We're not merging topics based on their
names, here.  We're merging topics based on their
subjects, and those subjects happen to be names.  The
subject of both C and D is the name "zorp", and that
subject has two subject indicators (the contents of the
"foo" and "bar" elements).  We haven't discussed any
topics that are asserted to have the name "zorp"; if we
had, their subjects would be like Subject #3.  Those
Subject #3-like topics might merge under the SAM's
name-based merging rule, but that has nothing to do
with your question about the merging of literals.)

> I don't see the escape door where to put arbitrary
> values in SAM (e.g., the number 2002 - which could be
> this year or could be something completely
> different).

Here's one way to do that:

<common-era-year id=glorp>2002</common-era-year>

... and use the content of the above <common-era-year>
element as the subject indicator of an element whose
subject is the year 2002.

In general,

(1) Put your literal in a context that makes its
    meaning clear, and

(2) Make it the subject indicator of the topic whose
    subject is that meaning.  

Am I speaking to your remark?  I'm not sure what you
mean when you say "escape door for arbitrary values".

> Background of my thinking: where to put the values of
> resourceData occurrences in the SAM without
> triggering name-based-merging?

I'm confused.  I don't see how name-based merging
could ever be triggered by <resourceData> elements.

[snip]

> Yes of course. What I meant was to copy the writing
> style of annex F (with having two maps before and one
> map after merging), because it makes everything
> crystal clear.

Ah.  Good idea.  In the official Reference Model, then,
I guess we should provide before-and-after graphs that
illustrate the operation of each of the merging rules.

> > All assertions have roles, and specific roles are the
> > most basic features of any assertion pattern. It's
> > good and necessary to provide a standard way to
> > determine, from the perspective of an assertion pattern
> > topic, what roles it provides.  
> 
> I agree that it is good and necessary.
> 
> > The Reference Model
> > could provide an "assertionPattern-role" assertion
> > pattern for this purpose.  
> 
> I - vehemently - disagree that this has to be provided 
> by the RM. 

I don't understand why you don't want to provide a
convention that would allow topic map authors to know
where to put such information, and topic map users to
know where to look for patterning information.  You
need to explain what harm it does, and why the benefits
are not important or necessary.

For some of us, including me, merging is what topic
maps are all about.  Merged topic maps should be
useful, even if their sources are diverse.  Having an
Application-neutral way to find patterning information
will help to make merged topic maps useful.

> > But, having done that much,
> > it would be negligent to fail to provide an optional
> > way to specify and to discover what the role player
> > constraints are, too.  That's why, instead of providing
> > an "assertionPattern-role" assertion pattern, the draft
> > Reference Model provides an
> > "assertionPattern-role-rolePlayerConstraints" assertion
> > pattern.
> 
> This shouldn't be in the RM either.

Again, I don't understand why you believe this.

> > The assertionPattern-role-rolePlayerConstraints
> > assertion pattern 
> 
> I am confused here. The document says that it is a built-in
> assertion *type*???

Yes, one of only two of them.  (There used to be more of
them, but after we promoted the scoping facility up to
the SAM level, and now there are only two left.)

> > needs to be in the Reference Model in
> > order to standardize the connection between an
> > assertion pattern, its role topics, and its
> > rolePlayerConstraint topics.  

> May[be] I misunderstood the purpose of the
> assertionPattern-role-rolePlayerConstraints? Let me
> ask two simple questions which hopefully make the two
> possibilities visible:

> (1) Is the purpose of the
> assertionPattern-role-rolePlayerConstraints to define
> *how* assertion pattern, its roles, and role player
> [constraint]s *are connected* in the model?

Yes.  It's only there to provide a standard way to
connect them.

> (2) Is the purpose of the
> assertionPattern-role-rolePlayerConstraints to define
> *what* roles and role players *are allowed* to be
> connected with assertions of a certain type?

No.  That's in the realm of validation, and the
Reference Model says nothing about validation.

Let's say we have just received a topic map (in some
hypothetical graph form that conforms to the Reference
Model), but we don't know anything about the
Application that defined its ontology.  All we know
about it is what the Reference Model tells us.  The
Reference Model tells us that any node that serves as
the P end of one or more AP arcs is an assertion
pattern, so we know what the assertion patterns are.
Now we look at an assertion pattern, because we want to
begin to understand this particular part of the
ontology.  (Remember: the whole purpose of topic maps
is to let you find exactly what you want, ideally
without having to look at anything else.)  Because the
assertionPattern-role-rolePlayerConstraints assertion
type is built into the Reference Model, we know that we
want to look at the set of
assertionPattern-role-rolePlayerConstraints assertions
in which the assertion pattern that we're interested in
plays the "assertionPattern" role.  Each of those
assertions takes us to one of the roles declared for
that assertion type, and, for each role, to the set of
constraints governing role players of that role.

Now, we don't necessarily understand the role player
constraints, because we don't necessarily understand
the ontology.  We don't know whether the role must be
played, or can be played, or by how many players, etc.
However, we can see all the assertions that are made
about the topic whose subject is the role player
constraints.  And we can see their patterns, and their
roles, and so on, so we have a way to understand this
topic map to whatever extent we really want to
understand it.

> If the intention is (1) my comment is:
> assertionPattern-role-rolePlayerConstraints is a
> possible approach but when RM is defined in a formal
> way (with use of some mathematics e.g., Z notation) I
> assume the formalism will provide better feature to
> describe this.

The draft Reference Model has very little to do with
logic.  It's really about semantic addressing.  The
draft Reference Model is nothing more or less than a
neutral envelope for both logical and illogical
ontologies.  It doesn't even have a class-instance
assertion type.

> If the intention is (2) my comment is: This is TMCL
> (!!!) and it does not belong into the RM.

Agreed.

> If the RM really needs to constrain some of its
> built-in assertion types (and it seems that there
> will be only two predefined ones) then this should
> also be done by the formalism and not by a construct
> which is part of the model. It should be *outside*
> the model.

Here I must disagree with you.  If things are outside
the model, then the following two statements cannot
both be true:

(1) In a topic map, anything can be a subject.  Even
    assertion types.  Even assertion types that are
    defined in the RM.

(2) In a topic map, everything that is known about a
    subject is directly attached to that subject.  Even
    the type of an assertion.  And even information
    about the type of an assertion.

> If it in the model it would unncessary confuse people
> (e.g., a TM user may raise the question "what is this
> assertionPattern-role-rolePlayerConstraints doing and
> how does it relate to TMCL?

That's for the designers of TMCL to decide.  The draft
Reference Model is a tabula rasa for them, and the ways
and extents to which they choose to use the
rolePlayerConstraints role are not predefined for them.

We must not allow the Reference Model to have the
effect of censoring all topic maps, forevermore, in a
way that will make it difficult or impossible for topic
maps to be usefully self-describing.  From a
knowledge-interchange perspective, I can see no
advantage in such censorship, but I see plenty of
disadvantages.  What would we say to customers who
observe, "Topic maps can take me directly to anything I
want to know, unless I want to know what are the roles 
of an assertion type." ???

Let me put this another way.  Without the
assertionPattern-role-rolePlayerConstraints assertion
type, we greatly diminish the usefulness of merging
topic maps that conform to multiple diverse ontologies
(Applications).  With it, we guarantee that assertions
made in "foreign" Applications are interpretable,
albeit with some effort, even after they've been
mixed with assertions made in foreign ontologies.

> Having said (written) this another questions comes
> into my mind: Your diagrams only show assertion
> pattern connected to the assertion by an AP arc. But
> where is the assertion type? I understood that type
> and pattern are two different things. They are
> related because the pattern describes (constrains)
> how an instance of such a type should look like but
> they are not the same. Am I right?

The subject of the topic that appears at the P end of
an AP arc is the assertion type.  That subject is
impacted by playing the "assertionPattern" role in any
"assertionPattern-role-rolePlayerConstraints"
assertion.  By playing that role in an assertion of
that type, it becomes an "assertion pattern", as well
as an assertion type.

As far as the Reference Model is concerned, there is no
requirement that every assertion must have an AP arc
(i.e., that it must have a pattern).  If there is an AP
arc, the only constraints on the topic at the P end are
that it can't also serve as the A, C, or R ends of any
other arcs.  There is no requirement that it play the
"assertionPattern" role in any
"assertionPattern-role-rolePlayerConstraints"
assertion.

The Reference Model doesn't define validation.

> If I am right, then I would also argue that we don't
> need the concept of the assertion pattern in the RM -
> just the concept of the assertion type. For the same
> reasons I have already given and as the implication
> of "if we don't have constrains in the RM then we
> don't need pattern'.

Driving Application-specific validation processes is
only one of the things that assertion patterns can do.
They also help users discover the meanings and
expectations surrounding assertion types.  This can
provide guidance when using "foreign" topic maps, and
it can also be invaluable when adding assertions to
existing topic maps.  

> > Requiring that instances of this assertion pattern 
> 
> I assume you mean instances of 'this assertion type'?
> 
> > be used to establish the
> > roles and role player constraints of assertion patterns
> > does not constrain validation in any way.  

> Whenever I read/hear/think about constraning or
> constraints I think about validation. That's where
> constraints are about.  That's their purpose and
> reason of their existance (besides guided editing,
> but that's a kind of interactive validation).

Constraints are not just about saying "no" to the user
when the user has done something invalid.  Constraints
are also very informative about the intended meaning of
any instance of an assertion type.

> > It doesn't
> > constrain the nature of constraints, or how constraints
> > are expressed, or how they are combined.  

> Do you suggest that TMCL has to be built on top of
> assertionPattern-role-rolePlayerConstraints?

I hope that TMCL will be provided with a rigorous,
predictable processing model that will show how the
constraints become complexes of assertions that fully
participate in the topic maps that they govern.  If
that's what you mean by "built on top of
assertionPattern-role-rolePlayerConstraints", then my
answer to your question is, "Yes."  But I wouldn't put
it that way, because the
assertionPattern-role-rolePlayerConstraints assertion
pattern is not a foundation, really.  It's just a an
Application-neutral way to allow authors to communicate
a topic map's ontological information to their users.

Logically speaking, there are just a few choices here:

(1) We can simply not provide a standard pathway from
    an assertion pattern to the roles that are its
    aspects.  This seems to be what you're proposing.

(2) We can provide such a standard pathway, and we can
    associate semantics with it (such as, "No assertion
    can have any roles that are not provided in the
    standard way by its assertion pattern.").  This is
    *not* what the draft Reference Model proposes, and
    I'm sure we both want to avoid this.

(3) We can do what the draft Reference Model proposes:
    We can provide a standard pathway from an assertion
    pattern to the roles that are its aspects, even
    though, at the Reference Model level, no
    constraints are being imposed by it, because the
    Reference Model doesn't know anything about
    constraints or validation.  This gives authors a
    way to inform users about the roles of patterns,
    and users a standard place to look for the roles.

> If yes: I would say that this does not fit with our
> figure how all TM standards are related. Because TMCL
> is build on top of SAM and not on top of RM. And SAM
> will not (at least that's my understanding) define
> any constraining mechanisms.

This is a misunderstanding, then, between us about our
Roadmap.  Here's my understanding of it:

* As at least one of its expressions, the Standard
  Application will be expressed in terms of the
  Reference Model.

* The Standard Application will include some number of
  constraint semantics, including semantics for
  combining constraints.  These, too, will be
  expressible in terms of the Reference Model.  (This
  point seems to be different from your understanding.)

* The Standard Application will have a syntax for the
  expression of constraints, called TMCL.  This
  language will provide a convenient way to express
  combinations of SAM-defined constraints on topic maps
  and on the constructs that they contain.  Since the
  mapping from SAM to RM will be known, then the
  mapping from TMCL to RM will be knowable, too.

You evidently actively oppose the idea that assertion
constraint information, or at least TMCL information,
will be available from inside the topic maps that it
governs.  I really don't understand why you wish to
forbid topic maps from being capable of describing
themselves.  (In fact, I can hardly believe that this
is what you really intend.)

> > Any number of
> > instances of any Application-defined assertion types
> > can be involved.
> 
> I don't understand this. What do you want to say?

I'm trying to say that the SAM should include an
ontology for constraints.  There will be several, and
perhaps many, assertion types involved in such an
ontology.

> > Requiring the use of the
> > assertionPattern-role-rolePlayerConstraints assertion
> > pattern *does*, however, suggest that the combination
> > of constraints that is applicable to players of a
> > specific role _should be a topic_ (or, at least, that
> > it *can* be a topic).  
> 
> When reading this I have the feeling that you that you
> use assertionPattern-role-rolePlayerConstraints as described
> in my question (1) (see above).

Yes.  As you put it, "...the purpose of the
assertionPattern-role-rolePlayerConstraints [is] to
define *how* assertion pattern, its roles, and role
player [constraint]s *are connected* in the model."

In other words, the purpose is to allow authors to let
users who don't understand the assertion find
information about it.

> > However, even in this respect,
> > the draft Reference Model doesn't constrain validation;
> > it doesn't *require* it to be a topic; the
> > rolePlayerConstraints role is not required to be
> > played, so, in any given Application, there may be no
> > such topics.  

> This is hard to argue as long as we don't have a
> formal model or at least more text describing what
> the RM want to require and want not.

You and I absolutely agree that the Reference Model
can't constrain the ways that Applications may choose
to constrain role players.

> > The draft Reference Model simply provides
> > a convenient, sensible, and standard place for such a
> > topic 
> 
> What is "such a topic"? The pattern/type, the role, the
> role player, or ???

I'm talking about the topic that plays the
rolePlayerConstraints role in an instance of a
assertionPattern-role-rolePlayerConstraints assertion
type.

> > to be found, with what we believe to be the
> > minimum overhead while preserving the maximum
> > flexibility for Applications.

> I assume we can easily figure out what should be in
> the RM and what not when we have the
> requirements. Something Lars Marius and others ask
> for for months. We did again the 2nd step before the
> first one. We are argueing about the model without
> having an agreement on the requirements - which is
> always a strange situation.

I'm sorry for the inefficiency of the process we're
engaged in, but I must disagree with you when you say
we skipped over the requirements analysis phase.  (The
Document Register doesn't agree with you about that,
either.)

Proposed models are always based on the requirements
that are known, and, once a model has been proposed, it
usually elicits previously unknown requirements
information.  It's an iterative process because nobody
is smart enough to know all the requirements, much less
to understand the implications of meeting all of them.

It would be most helpful if you would state the
requirements you feel are not being met by the current
draft of the Reference Model.  If you disagree with the
requirements that the draft Reference Model is designed
to meet, then you must explain why those requirements
are unimportant or undesirable, and you must also
explain the exact nature of the conflict between those
requirements and the requirements that you're
proposing.  So far, all I really understand is that you
don't want a universal mechanism that allows patterning
information to be found in any topic map, regardless of
Application.  I don't know *why* you want to prohibit
the existence of such a mechanism, or what harm it
might cause, or how it might conflict with other
requirements, or what those conflicting requirements
are.

-- Steve

Steven R. Newcomb, Consultant
srn@coolheads.com

Coolheads Consulting
http://www.coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

1527 Northaven Drive
Allen, Texas 75002-1648 USA