[sc34wg3] Identifying and comparing subjects (and a possible extension to \tau )

Sun, 01 Aug 2004 13:42:21 +0100

Small rant and suggested extension to \tau model
================================================

Identifying and comparing subjects via proxies is not in general
tractable/computable. So, what we are looking for is a reasonable practical
strategy. That the TMDM/TAO way works in practice for a growing number of
real situations is important (and is BTW a v. legitimate basis for LMG's
position). 

Subjects are neither atomic nor stable. A classic example is colours:
although a dictionary would equate "grey" with "llwyd", and "red" with
"coch" (Welsh), there are things I would say were "llwyd" or "coch" in Welsh
that I would say were brown in English. Moreover, a younger Welsh-speaker
would be likely to make more use of "brown" as a loan-word from English than
I do, though I would still expect their boundary "brown/coch" to be
different from the red/brown boundary of an English speaker. This example is
particularly clear and simple, however this kind of thing happens all over
(witness our own group discussions!).

To combat this, you might put in place a more effective/stable way of
identifying/comparing subjects, by using proxies designed for the job, such
as a term set derived from a controlled vocabulary. This works v. well for
everyone using that term set. However, the picture changes once you have
more than one term set (eg medical and social-work). In this situation, the
two goals of mapping between the term sets and preserving the information of
each source actually conflict. That is, if you set up a correspondence
between two different term sets, then use that correspondence to merge two
topic maps based on the respective term sets, then you generally lose
information in real terms even though there is an argument that the result
topic map "contains" everything the source topic maps contained. The
severity of this problem increases in proportion to the precision of the
term sets used, because that very precision is lost, and is valued by the
information users.

This has been (since 2000 ;-) my main argument against merging always being
an a+b=c kind of operation that loses the fine structure of a & b. I believe
that there should also be a different operation that both preserves the
original maps and defines a view-as-if-merged-according-to-this-mapping. 

I'm realizing as I write that this suggests a different formal account of
merging - call it merge2 - eg (in \tau) where the two maps being merged have
different sets of names (curly N1 and N2) in the universe (curly I) and
instead of merge2 being a set union, it is a function where the desired
property of "preserving information" from the source maps to the target maps
becomes a morphism property of the function.

Ann W.