[sc34wg3] New SAM draft

Lars Marius Garshol sc34wg3@isotopicmaps.org
20 Feb 2003 21:23:33 +0100


(Extra, triple, apologies for this one. These comments are very much
appreciated. They are all high-quality, and deserved better treatment
than this. I'm sorry.)

For anyone reading this: Martin is commenting on SC34 N0356, not the
latest SAM draft.

* Martin Bryan
| 
| As I shan't be able to get to the part of the WG3 meeting scheduled
| to discuss the SAM I thought I submit comments by e-mail and hope
| that someone could bring them up at the appropriate point in the
| discussions.

I'm afraid I didn't see this until after the meetings, but I have gone
through your comments now. Replies given below.
 
| Abstract
| Rather than saying "will supersede ISO 13250" I suggest we state
| that it "will augment ISO 13250".

This text will not be part of the standard, but I agree it is still
important. We'll be revisiting the roadmap soon, and then we'll see
what to say here. (In a sense you certainly are right.)
 
| 1 Introduction, para 3
| "Clearly documented" is judgemental: "documented" is sufficient,
| unless you want to make it clear that such documentation is publicly
| available, in which case "publicly documented" would be better.

Good point. Corrected.
 
| 1 Issue (scope-extension)
| The use of the word scope in this context is ambiguous. I take it that we
| are not referring to topic map scopes but the scope of the standard itself.

Correct.

| ISO 13250 is certainly not "restricted to only defining the issues
| related to the interchange of topic maps". It defines architectural
| forms to which topic map architectures must conform, irrespective of
| whether or not the architectures are interchangeable.

What the SAM document says is the view held by Steve Newcomb, and I
see that the text of ISO 13250 does not support his opinion. So I take
it that this means that you think that the scope always was broader
than mere interchange. I would like to hear what SRN thinks of this.
 
| 3.1 1. [notation]
| The statement "If not, the two first characters of the string must
| be x-" is invalid for the HyTime locator notation and for any other
| ISO defined locator, which must conform to the ISO rules for naming
| notations. I suggest you changes "If not" to "For user-defined
| notations".

I could, but I would rather list the HyTime notation(s) here
explicitly.  I'm basically waiting for the HyTM work to proceed so
that we can add the necessary HyTime notations to this list. Any
takers?
 
| 3.2 and 3.3
| A better explanation is needed of why source locators can be sets.
| Can such sets only be created using merging rules? (If so, say so
| specifically.) If not, make it clear how two source locators can be
| assigned to a single subject. (Remember that ISO 13250 does not
| formally define source locators.)

This has been brought up repeatedly, so it does seem that this should
be explained. I've added a para at the end of 3.2 about this now.

| 3.3 Para immediately preceding SAM Constraint heading
| What happens if the only difference between two entries is the
| source locator chosen to reify the topic map item? Surely this isn't
| a conformance issue!

This equality rule is here so that syntax specifications can define
what conformance to the syntax specification means more easily.  This
allows them to say that an implementation is conformant if for every
instance of the syntax they can create a SAM instance X, then
serialize that to the syntax, read it back in to a SAM instance Y, and
have those two instances be equal according to this rule.

Does that clarify things?

| 3.4 Issue (subject-vs-resource)
| A subject and a RFC 2396 resource are definitely not the same
| thing. A subject is a categorization of a resource. It serves as
| metadata with respect to it. Acts Ch12 v5 is a perfectly good
| subject. It has millions of potential resources (at least 10 in the
| house I am currently in!) For each of these potential resources the
| subject can serve as metadata "for one small verse from a large
| collection of clearly identified verses" (not that all bibles do
| clearly identify all verses!)

I think you've misunderstood the definition of resource in RFC 2396.
RFC 2396 makes it very clear that a resource can be *anything*, so
long as it has identity. So Martin Bryan the person, the cup next to
my laptop at the time of writing, and the country of Norway are all
resources according to RFC 2396.

| 3.4 4th para
| "Every topic represents one, and only one, subject" is incorrect.
| Suppose I create a supertype node that brings together the subjects
| of "astronomy" and "quantum theory". There is no known name for this
| subject so I create a new topic and name it Astronomy & Quantum
| Theory". Is this topic really about only one subject? (I know what
| you are trying to get at, but the wording is inaccurate!)

I've tried to fix this in the latest draft:
  <URL: http://www.isotopicmaps.org/sam/sam-model/#d0e532 >

Do you think the new wording works better?

| When you say "it is not clear that they represent different
| subjects" you are introducing a double negative. It would be better
| to be positive and state that "it is clear that they represent the
| same subject".

I agree. That whole thing was a bit messed up. I hope the new draft
does better, but if not, please let me know.
 
| 3.4.1 Subject address
| The definition of the term subject address is inadequate to make it
| clear where and when subject addresses can be used. To me "a locator
| that refers to the information source that is the subject of a
| topic" is a statement that could refer to any occurrence identified
| by the topic. I do not think this is how the term is used within the
| SAM, but why it should not be used in this way is unclear. A clearer
| distinction between subject indentifier and subject address is
| required.

Hmmmm. Doesn't this make it clear that the subject address points to
the *subject* of the topic, as opposed to a relevant information
resource, which is all occurrences do. So that if topic X has
<URL: http://www.google.com > as its subject address, the subject of
topic X *is* Google's home page?

The subject identifier, on the other hand, points to a subject
indicator, and a subject indicator is an information resource that
describes what the subject of the topic is.

Do you still feel that this is not clear, and that the text is not
making this obvious to readers? If so, I'll add in the examples to see
if that helps, and you can review that.
 
| 3.4.1 Issue (term-subject-indicator-def)
| Acts Ch11 V5 is a subject indicator that does not refer to "a
| specific" information resource, but could still exist as a locator
| of the form urn:purl:bible:JamesI:Acts:Ch11:V5 or some equivalent
| identifier.

I agree that this is possible, the question is just what we do with
the spec to make it handle such situations. A URI may potentially even
identify things that are not information resources in any sense of the
word, such as a person.
 
| 3.4.1 Issue (term-subject-address-def)
| Re "Does it represent that storage location" the answer must be
| no. What if the address is reassigned to different hardware
| containing a copy of the referenced resource? What if the address is
| notational, as in the above example? What if there are multiple
| copies of a particular resource (whether notational or not, as per
| retrieval from caches rather than the original resource)? This must
| be left an open issue for applications to determine.

I agree. This is also how we resolved the issue, in the end.
 
| 3.4.1 Issue (topic-naming-constraint)
| For backward compatibility with version 1 of the standard the
| constraint should be retained, though a means of deliberately
| overriding it can be provided as an extension

A resolution to this has now been agreed by the committee. The TNC
will not apply by default, but those who want it can apply it to those
names they would like it to apply to.
 
| 3.4.3 Scope
| When rewriting be aware of the conflict between the word "all" in
| the first para and the word "each" in the Issue statement. Each is
| better as scope means "in one or more of these contexts".

Better than 'any'? That could be. I guess that's one subtlety of the
English language that was lost on Marc and me.
 
| 3.4.3 Issue (scope-unconstrained-rep)
| Unconstrained scopes could be represented by an empty set

They could, but whether they should be or not depends on the
resolution of term-scope-def. If we go with "a characteristic applies
when ALL the subjects in its scope apply" then it must be the empty
set, but if we go with "a characteristic applies when ANY of the
subjects in its scope apply" then it must be the universal set.

If we leave the interpretation of scope open we can choose, and then I
think I would choose the empty set.
 
| 3.4.2 and 3.4.3
| These are suddenly invisible :-)

Huh? What?
 
| 3.4.4 2nd para
| The phrase "or to associate a topic name with certain topics"
| suggests that the same name can be assigned to different topics. I
| would not want to suggest this as a good idea to anyone, or to
| suggest that there is a work around to it. 

I agree. This wording has now been changed.

| (This work around sound suspiciously like RDF reification, which I
| suspect you will find adds too much of an overhead to everything to
| be viable.)

RDF reification is generally agreed to be a kluge.
 
| 3.4.4 3rd para
| Do we really want "if an information item has a source locator item
| that is equal to one of the items in the [subject identifiers]
| property of a topic, that topic item reifies the information item"
| to apply if the source locator is associated with an occurrence
| item? This is what the sentence states at present. I suggest you may
| want to qualify this statement so that "information item" becomes
| "topic item".

Yep, this is precisely what we want. The purpose is to be able to
create a topic that unambiguously represents an occurrence, so that
names, other occurrences, and associations can be assigned to it.
Basically, so that you can speak about the occurrence.

| 3.4.5
| The names in the UML do not match thise in the list for [topic
| names] and [subject identifiers].

I know. We need to update the UML, but the text was finished so late
that there was no time to do that. We will do better over Christmas.

(LMG adding rather later: we have done better...)
 
| 3.4.5 Issue (prop-subj-address-values)
| You need to give a clear reason as to why sets are not applicable in
| this single instance of a locator property.

Indeed we do, and the resolution Graham and I are leaning towards is
to make this property a set.
 
| 3.4.5 Issue (prop-subj-address-scope)
| Anything defined as a topic is a valid theme for a scope. This is
| one of the reasons why I oppose making built-in subjects into
| topics.

I understood, and agree, with the first part, but you lost me on the
second. 
 
| 3.4.5 Issue (strings-as-subjects)
| It should be possible to create topics that represent strings. I
| might want to create a topic that refers to all occurrences of the
| string XML (I would use an XML query to identify the occurrences of
| this topic!)

I tend to agree. We'll try to work in support for this.
 
| 3.4.5 SAM Constraint: Source locator and subject identifier namespace
| The reason for this constraint is not obvious. Must it be flagged as
| a fatal error? The word namespace occurs in the title. These two
| names are in different namespaces. Why cannot applications simply
| assign them to different namespaces to resolve the possible
| conflict?

Good question. This is currently being discussed as part of evaluation
of the resolution to issue merge-srcloc-vs-subjid. Steve Pepper seems
to think that these namespaces should be considered distinct. If you
have an opinion on this, please do feel free to post to this thread:
<URL: http://isotopicmaps.org/pipermail/sc34wg3/2003-January/000853.html >
 
| 3.5
| The UML model, and the overall model, are not conformant with the
| fact that ISO 13250 allows multiple sets of base names to be
| associated with multiple display names, and multiple sort
| names. Under this model data would need to be replicated for each of
| the base names associated with a set of display/sort variant names.

That is correct. (This is issue variant-in-basename.) The resolution
agreed on by the committee is that this is an improvement on the model
implied by the HyTM syntax.
 
| 3.5 Issue (prop-value)
| One vote for the use of [label] rather than [value] for the strings
| that make up names

Graham argued that that would be redundant, as the property already
has "name" as part of its name. It looks like we will resolve it in
favour of [value], although I am sympathetic to your point of view.
 
| 3.5 Issue (names-with-types) and (names-as-subject)
| Base names should be allowed to have types (that's why multiple
| basenames are allowed in 13250: you can assign both Acronym and
| FullName as the types of name of a subject). 

This is what was agreed in the meeting.

| But you should be wary of merging based only on labels because the
| same label can have different roles. What you need to base merging
| on is the combination of type, label and a match of one or more
| scopes (i.e. do not require an exact match of scope sets, but
| require that the shared label has at least one of the scopes
| assigned to the name it is being merged with.)

Personally, I agree with this, but no decision was taken in the
meeting regarding the role of scope in TNC merging in the new
standard. It will be an issue, which I hope we will be able to resolve
in the way you suggest.
 
| 3.7
| ISO 13250 does not require that all occurrence types be topics: this
| is a constraint only imposed (unreasonably in my view) by XTM.

I know. We have another discussion going on this, I think.
 
| 3.9
| ISO 13250 does not require all roles to be topics either. (One
| reason for this is that it prevents situations where roles, and
| occurrence types, get used to create scopes for topics.)

What you say is true, but the thinking behind the text was that the
HyTM deserialization specification would create topics from elements
where the types were only specified as strings (as opposed to by
reference to topics). As far as I know you are the only person who
disapproves of this solution.

If you feel this is important it would be good if you could describe
what the use case for this is. That is, what are these alternative
mechanisms good for, why do we need them? If you can explain that we
can see if there is some way to meet this requirement.
 
| 4.1 [NOTE: this section is now 4.2]
| The third item in the list is misleading. The subject address
| locator should not be OR'd with the other entries, as it is
| something that is an AND constriant on the other 3 options.

This is incorrect, I'm afraid. If two topics items have the same
subject address they must be merged, regardless of what other property
values the two items may have.

| While item 6 in the list is misleading, in that there is no
| constraint stated in 3.4.1 about the sharing of subject addresses
| and the statement does not require them to have matching addresses,
| the para after the list makes the third item impossible unless one
| of the other three is in force.

I think it may be the clumsiness of the text that gives this
impression. The authors have resolved to do away with this constraint
in any case, which means we won't have to clean this up; we can just
remove it instead. :)
 
| 4.2
| Basenames should not be merged if they have different types (see
| comments above re 3.5)

Absolutely, but this text was written before base names were allowed
to have types. It will be updated.
 
| 4.3
| It is bad practice to overwrite any existing map with a merged
| maps. Merged maps should always create new maps, just as new items
| do. You may then remove the source maps, but you should never change
| the sources.

Sure. This just specifies a possible procedure to follow, and doesn't
say which one implementors and users actually are to follow. (This
particular way of formulating it was chosen in this case because it
made the XTM specification easier to write (and read).)
 
| Don't you need to apply 4.1 and 4.2 after any such merging?

Certainly, and all the other merging clauses as well, but the standard
already says (in the definition of set) that this must be done
whenever duplicates are created through modification. 

We will revisit this text, however, to make it more declarative. I
think doing so will resolve both of these issues.
 
| 5.3 2nd/last para
| Change "as a label" to "one of the possible labels" (switching
| between variants is an application dependent decision, which is why
| multiple display names, of different types, are permitted in 13250
| so that you can choose the type of display that suits your working
| environment).

Applied.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >