[sc34wg3] Best practices for representing occurrences. or why occurr ences are not associations?

Dmitry sc34wg3@isotopicmaps.org
Sun, 23 Feb 2003 15:20:27 -0500


As a Topic Maps technology "user", I really appreciate if participants of
this forum can share some ideas about current "best practices" for
occurrence representation. I am sure this item was discussed many times
before. But because of recent RM/SAM discussions I think it would be
interesting to share latest ideas.

I think that in many usage scenarios (covered by 80/20 rule) information
resources have to be explicitly represented in topic map as first class
entities (as reified information resources). And we would like to describe
different properties of these resources (Publishing Date, Author(s),
Version, Format etc.). We also would like to define some classes of
information resources (Ex: News, Company External News, Company Internal
News, Departmental News etc.)  And, of course, we would like to connect
(reified) resources with domain topics.

I found two main recommendations:
. Reify occurrence as topic and make assertions about reified occurrence
. Reify information resource as topic and create associations between
"domain" topic and information resource topic.

First approach works well if we need to represent additional information
about occurrence itself (such as "strength", for example).  But I am not
sure if it works well for representing such information as "Publishing Date"
, "Authors(s)" because we typically have many-to-many relationship between
"domain" objects and information resources. In fact, XTM 1.0 has sample in
section 3.9.1 which demonstrates this approach.

Personally, I like second approach more, but in this case we are kind of
"losing" occurrence concept. We in fact introduce new (no standard for
that?) association which plays role of advanced occurrence (between topic
and reified information recourse). Because it is not a standard association
software tools can not really use it in a compatible mode.

One of the latest examples of this second approach I found in "The XML
Papers: Lessons on Applying Topic Maps" by Steve and Lars. "Topic maps in
content management" by Lars also explores in details this option.

In section 3.1 of "The XML Papers." authors describe benefits of reifying
conference papers as topics, rather then simply as information resources
connected to other topics via occurrences.

I am totally with authors regarding these benefits. But I think, it is also
a good example of how concept of occurrences "disappears" with this
approach. As you know Omnigator has a special window for occurrences. This
window allows quickly show all resources available about specific topic. In
case of "let's introduce associations instead of occurrences" all document
references are mixed with other associations. I am not saying that it is
bad. I am just saying if a good practice is to reify resources as topics and
use special "occurrence" kind of associations (such as "Mentioned in") why
do we need regular occurrences?

As a "naïve user" I would like to discuss several ideas which, I think, can
clarify this issue.

So, let's:
1. Define that "occurrence" in XTM is a shortcut (the same way as
instanceOf ) of association of type "occurrence" which is a subtype of most
general association type "assertion".
2. Define that  <resourceRef> in occurrence element is a shortcut for
reified as topic resource with subject address equals to URI from
<resourceRef>
3. Define that "type of occurrence" in "occurrence" element is a shortcut of
<instanceof> for reified topic.

So I suggest that if we have something like this in XTM:

<topic id="verdi">
    <instanceOf>
        <topicRef xlink:href="opera-template.xtmp#composer"/>
    </instanceOf>
    <baseName>
          <baseNameString>Verdi, Giuseppe</baseNameString>
    </baseName>
    <occurrence>
        <instanceOf>
            <topicRef xlink:href="opera-template.xtmp#article"/>
        </instanceOf>
         <scope>
                <topicRef xlink:href="opera-template.xtmp#snl"/>
                <topicRef xlink:href="opera-template.xtmp#offline"/>
        </scope>
        <resourceRef xlink:href="http://opera.stanford.edu/opera/Verdi"/>
    </occurrence>
</topic>

 it is a shortcut for these descriptions

<topic id="verdi">
    <instanceOf>
            <topicRef xlink:href="opera-template.xtmp#composer"/>
    </instanceOf>
    <baseName>
            <baseNameString>Verdi, Giuseppe</baseNameString>
    </baseName>
</topic>


<topic id="ArticleAboutVerdi2000232323">
    <instanceOf>
            <topicRef xlink:href="opera-template.xtmp#article"/>
    </instanceOf>
     <subjectIdentity>
        <resourceRef xlink:href="http://opera.stanford.edu/opera/Verdi"/>
    </subjectIdentity>
</topic>

<association>
    <instanceOf>
            <topicRef xlink:href="occurrence"/>
    </instanceOf>
    <scope>
            <topicRef xlink:href="opera-template.xtmp#snl"/>
            <topicRef xlink:href="opera-template.xtmp#offline"/>
    </scope>
    <member>
          <roleSpec>
                <topicRef xlink:href="referencedTopic"/>
          </roleSpec>
          <topicRef xlink:href="#verdi"/>
    </member>
    <member>
        <roleSpec>
                <topicRef xlink:href="informationResource"/>
        </roleSpec>
        <topicRef xlink:href="#ArticleAboutVerdi2000232323"/>
    </member>
</association>

Because it is a "standard" shortcut software tools can take advantage of
this standard and show this association in "occurrence" window. Actually,
more important is that user can use standard "XTM 1.0" syntax as usual and
get all benefits of reified as topic resources if it is needed.

Interesting "side effect" of this is that SAM will be closer to RM. I mean
we still have "XTM 1.0" shortcuts but they are "explained" in terms of RM as
special kind of assertions.

We can do the same "trick" with resourceData element. Let's extend
"subjectIdentity" element and allow "resourceData" inside of this element.
We can reify "resourceData" the same way as we did with occurrence.

Other interesting "side effect" of reified "resourceData" is that we can use
it now in associations.

Example (ordered list of authors for some document):

<association>
    <instanceOf>
            <topicRef xlink:href="authors"/>
    </instanceOf>
    <member>
          <roleSpec>
            <topicRef xlink:href="resource"/>
        </roleSpec>
        <topicRef xlink:href="#ReportAboutInvestmentOpportunitiesFor2003"/>
    </member>
    <member>
        <roleSpec>
                <topicRef xlink:href="author"/>
        </roleSpec>
        <topicRef xlink:href="#John3000 "/>
    </member>
    <member>
        <roleSpec>
                <topicRef xlink:href="seqnum"/>
        </roleSpec>
        <resourceData>2</resourceData>
    </member>
</association>

<association>
   <instanceOf>
              <topicRef xlink:href="authors"/>
  </instanceOf>
  <member>
      <roleSpec>
            <topicRef xlink:href="resource"/>
      </roleSpec>
      <topicRef xlink:href="#ReportAboutInvestmentOpportunitiesFor2003"/>
    </member>
    <member>
            <roleSpec>
                <topicRef xlink:href="author"/>
            </roleSpec>
            <topicRef xlink:href="#Terry5003"/>
    </member>
    <member>
            <roleSpec>
                    <topicRef xlink:href="seqnum"/>
            </roleSpec>
            <resourceData>1</resourceData>
    </member>
</association>

Again, <resourceData> in this case is a shortcut for reified data resource.

As a result we have, in fact, tree types of topics:
. Reified Resources
. Reified Data
. Regular Topics

Occurrence is a subtype of "assertions" represented as association or using
"occurrence" shortcut. Shortcuts allow using existing "XTM 1.0" syntax. We
have more flexible model for representing occurrences. SAM is more
"explained" in terms of RM then it is now.

That's what I have so far.

Dmitry