[sc34wg3] Scope, again

Martin Bryan sc34wg3@isotopicmaps.org
Fri, 29 Nov 2002 13:59:06 -0000


Marc asked the following:

> | or b) identifying that a name is
> | used by more than one language, reside?
>
> I don't really get the point. This responsibility would surely rest with
the
> Topic Map author, being the person with domain knowledge, wouldn't it?

How can it? I cannot know how many languages use Paris or Parijs as a name
for the capital of France. I know that the French and English use Paris, and
the Dutch use Parijs (as presumably do Flemish speaking Belgians), but what
do Afrikaans use, or Tamils, or Japanese? The topic map author can only
record those facts he knows. The purpose of having merge rules is to allow
us to build up knowledge from multiple sources, clearly identifying that a
single Topic Map author is not the person with all domain knowledge.

> | [NB: The Paris case is a good example of what happens is practice. If a
> | specific language does not have its own version of a place name then the
one
> | applied by the natives is adopted. So the fact that there is no language
> | entry for Tamil in the topic map name list for the Paris topic does not
mean
> | that Tamil does not recognize Paris, only that it uses the default name.
It
> | is for this reason that I am very much against treating languages as
scopes,
> | which restrict the rules you can apply to the selection of names.]
>
> Many folks use scope for languages, and so do I. But I am willing to
learn. In
> a previous posting
> (http://www.isotopicmaps.org/pipermail/sc34wg3/2002-July/000447.html) you
> argued:
>
> | When 13250 was being defined things like language and dates were
> | seen as being likely to be defined as facets rather than scopes as
> | they are typically externally defined.
>
> So what would be the preferred way to handle language in XTM?

XTM, as an application of XML, should have kept to the well-defined method
of defining language using the xml:lang attribute, which is obviously a
facet in HyTM terms. (Hence my comment about the list of valid values being
externally defined.)

> On a sidenote, ISO 13250:1999 says: "NOTE 40: The topic referenced via the
> type attribute can have many names in scopes designed for many different
user
> contexts, including many different natural languages ..." See also Notes
42,
> 46 and 51.

Note also the bit immediately after your quoted informative note: "and
delivery platforms"! This note says that "EN Sun" and "EN IBM" are
different. I always considered these as incorrect use of name scoping, as is
the case with the use of dates, where the requirement for time axes to
position overlapping dates makes the case even clearer. The fundamental
problem is when these things need to be applied, and how they are applied. I
prefer that things used to subset Topic Maps to provide user-defined view
should be facets, which do not need to have formally defined values. When
you start to munge them all together as scopes, which must have formally
defined values, then you start to run into problems as to who defines the
topics and where. For example, what if I choose to name a topic using EN-US,
EN-GB and EN-cockney as the values of my scopes, and you chose to use just
EN? As scopes cannot be merged because they have different values, while the
rules for xml:lang clearly spell out the relationships between the names.
Using facets I can overcome the difficulty.

> XTM says (http://www.topicmaps.org/xtm/1.0/#elt-baseName): "Natural
language
> discrimination between base names may be specified by a child <scope>
> element."
>
> This seems to suggest scope is the way to handle natural language.

Not for an XML expert, only for a TM one! What XTM did was force its users
to treat language as a scope, rather than continue to offer the choise the
HyTM provides.

> I guess I should have said:
> '"Parijs" _is_ the name of Paris'. This assertion is valid in Dutch but
not in
> most (all?) other languages.

I would prefer the more accurate "Parijs" is the name Dutch speakers apply
to the Capital of France" as this disambiguates their treatment of Paris
France from that of Paris, US (wherever it is!). This highlights the problem
that trying to apply the language to the name of the topic gives. The topic
is really about "The Capital of France". It has Paris as one of its names,
and Parijs as another. The latter is the one that Dutch speakers prefer to
use when talking or writing to other Dutch speakers, but is not what they
use when talking or writing to those they know do not speak Dutch (or
Flemish!). By trying to make assertions simply links between the name and
the scoping language we are ending up restricting the assertion to something
that is not really meaningful.

(Note that the "of Paris" and "Paris has" statements in all messages
exchanged recently are incorrect. The statements should really have been "of
the topic with identifier X used to identify the Capital of France, known to
the French as Paris" and "The topic representing the Capital of France,
known to the French as Paris, has". We really are making a rod for our back
by introducing sloppy shorthands of the types being used in these messages.
As another aside, how should we warn people that Paris[FR] is not pronounced
the same and Paris[EN] even though it is spelt the same?)

Martin