[sc34wg3] Illustrating SIDPs
Patrick Durusau
sc34wg3@isotopicmaps.org
Mon, 10 May 2004 07:58:04 -0400
Robert,
I think yet another confusion is about to bite the dust! Read on:
Robert Barta wrote:
> On Sat, May 08, 2004 at 01:15:48PM -0400, Patrick Durusau wrote:
>
>>Dmitry wrote:
>>
>>>On May 4, 2004, at 2:30 PM, Patrick Durusau wrote:
>>>
>>>
>>>>SIDPs (and, for that matter, OPs) can be arbitrarily complex.
>>>>
>>>
>>>That is what I cannot find in current TMRM. I see that SIDP can be
>>>defined as "combination of properties" which I cannot call "arbitrarily
>>>complex".
>>>
>>
>>Why did you decide that "combination of properties" does not equal
>>"arbitrarily complex?"
>>
>>Would you prefer "arbitrary combination of properties?"
>>
>>Curious because that "combination of properties" is understood (by me at
>>any rate) to mean "arbitrarily complex."
>
>
> Patrick,
>
> Obviously we reach here an area where - without more formalisation - confusion
> is imminent.
>
> If I hear "arbitrary combination of properties" then I would think that there
> are properties p1, ..., pn and the combination is defined as function f operating
> on these properties, returning a new, combined 'property':
>
> p_new = f (p1, ..., pn)
>
> In your earlier example f would simply concatenate two existing properties
> of a topic:
>
>
>>> * Property name: "personID"
>>>
>>> * Value type: complex:
>>> "name" : string
>>> "spouse" : topic
>>>
>>> * SIDP or OP?: SIDP
>>
>
> So the equivalence is induced by the function values which are built
> from - more primitive values. This is all good and well, but...
>
>
>>Dmitry said:
>>
>>>TMCL on the other hand can express any equivalence function. I just
>>>think that TMCL is more general and powerful approach for defining
>>>equivalence classes. "Combination of properties" is a subset of possible
>>>equivalence functions.
>>
>
> ...with functions over topic properties (and the primitive identity on
> symbols =strings)) you cannot handle application-specific constraints
> as TMCL would allow you to:
>
> What about an identity which is induced by the fact that two topics
> should be regarded the same if both are "involved in the same kind and
> number of criminal activities"? (Not saying that TMCL should be able
> to capture this!)
>
But the TMRM says that it is precisely that sort of identity that should
be able to be captured.
The difference with your suggestions based on TMCL, is that the TMRM
would allow the topic map proper, separate and apart from TMCL, to
define such identities. That is to say that identity is defined in the
topic map and not some additional mechanism.
> TMRM is 'property' oriented. That is nice and easy for many cases, but this
> cannot be regarded as 'arbitrarily complex'.
>
Here is the source of confusion I alluded to above. When you say
"property," it appears that you are excluding "functions over topic
properties". Yes?
The TMRM is using 'property' in the sense that includes "functions over
topic properties", including a topic's participation on various parts of
assertions in the topic map.
Formalism would help, but only if we also have agreed upon terms to
describe what appears in the formalism.
> Now, ....
>
>
>>There is no other hand. The TMRM and TMCL are addressing completely
>>different levels of the topic map paradigm.
>>
>>The TMRM is not defining a syntax but the rules for a disclosure
>>statement, on which a syntax would be based.
>>
>>In other words, the TMRM allows disclosure of the basis for identity
>>that must underlie any equivalence or other functions.
>
>
> ... of course you say that. But to me this sound as saying "the nice
> and easy things we can capture with properties and functions on
> properties - while we do not say how it is done" and "the other stuff
> is done with TMCL".
>
No, to some degree it is a question of 'where' it is done. Why repair
the weakness of a topic map to declare rules for identity by adding
TMCL? Why not simply have a model for topic maps that avoids the
weakness altogether?
Certainly, some people may wish to have topic maps that require the use
of TMCL in order to have the identity rules they require. I don't see
the rationale for forcing that choice on everyone who wants to use topic
maps.
>>From an standard engineering point of view one might argue that there
> is so much overlap in the task to define an identity concept, that it
> is worth exploring the redundancy.
>
> Now, the fact that TMCL is obviously going to be well-founded and uses
> a more formal, explicit way to impose structural constraints on maps
> (and identity is just that) might raise the question how TMRM fits
> into the picture.
>
I read "more formal, explicit way" to say you are designing a
syntactical solution. Sure, but you have to have some model upon which
that syntax is being developed. There is always redundancy between a
model and a syntax based upon it but that does not mean that the model
is a second class citizen or not needed in some way.
The other problem, as I noted above, is why not allow a topic map to
define the very rules of identity that you want to put into TMCL? Isn't
that an implementation decision?
Other than following one notion of how topic maps ought to be
implemented, what is the advantage in that approach?
NOTE: We should have a standard for implementing topic maps a la XTM.
But, that standard should also have a reference model that allows the
construction of topic maps and topic maps software that do not rely upon
XTM. XTM and its predecessor, HyTM, are, afterall, only interchange
syntaxes that represent a way to exchange topic maps. You can process
topic maps using those syntaxes but that does not mean they define what
it means to be a topic map. (And yes, topic maps based on a reference
model would have topics, associations, occurrences, etc., but also the
robust identity rules that you want to place in TMCL.)
>
>>Mixing levels again. The TMRM, using SIDPs, enables the disclosure of
>>identity that must be present for any such rules to make sense. And yes,
>>you can then disclose the equivalence (or any other rule) that you like.
>
>
> I think what Dmitry is after (and I would support that) is that
> identity at the most fundamental level is simply done by the identity
> of symbols, so strings, and the derived ones on URIs. From that you
> would derive identity based on source location, etc.
>
> Yes, you are right that we are mixing levels, i.e. using TMCL now for
> parts where TMRM has put in a claim. Maybe it is TMCL which has to be
> at two levels:
>
> - level one as a language to define
>
> - what actually properties are in terms of associations. So, for
> instance, a property 'email'
>
> $t.email <=> $t -> entity \ has-email-address / email
>
> (saying that any topic which is involved in a 'has-email-address' association
> with the proper roles can be regarded to have an 'email' property)
>
> - what derived properties are:
>
> $t.age <=> now - $t.born
>
> - what derived identity can be:
>
> ident ($a, $b) <=> $a.email eq $b.email
>
> - what identity can also be:
>
> ident ($a, $b) <=> ...
>
Rather than saying that you are using TMCL where the TMRM has put in a
claim I would say that TMCL is addressing a weakness in the current
model that the TMRM does not have.
Yes, the TMRM deliberately lacks a syntax (at the urging of the WG as I
recall) and nothing in it compells someone to construct the identity
rules entirely in a topic map. You could build something like TMCL to
handle some parts (or all I suppose) of the identity question.
Stepping aside from the TMRM for a moment, recall that we discussed in
Amsterdam the need to have a "reference model" as the common basis for
TMCL/TMQL, etc., and have a workshop set for Montreal. What I would
suggest is that a "reference model" that provides the framework for
however one wishes to allocate the resolution of the identity issue is
the goal of that exercise.
Certainly, anyone can use TMCL to enforce identity rules if they like,
but I have yet to hear an argument that such rules could not properly be
part of a topic map or topic maps software.
> - level two captures the other things covered in the TMCL use cases.
>
> [ Merging is handled in general as a function M which maps map1, map2 into maps.
> This could be TMQL, btw. ]
>
>
>>But wait:
>>
>>P1 (carol (wife of patrick), bambi (deer in Walt Disney movie), clarence
>>(patrick's dog), and
>>
>>P2 (carol (as to sing), bambi (a stripper), clarence (ghost in "It's a
>>Wonderful Life," a movie).
>
> ...
>
>>Answer: Well, TMCL presumes a doctrine of identity (that should be
>>defined elsewhere) that allows it to make meaningful comparison of the
>>members of each ordered set.
>
>
> Yes, and this doctrine can be that of string equivalence.
>
>
>>The notion that identity and doctrines of identity can be
>>assumed/presumed is quite surprising to me .....
>
>
> That is not what I (or also probably Dmitry) say: My conjecture is
> that we can found ALL concepts eventually on the identity of
> strings.
>
Well, ultimately in any software process all you can do is compare strings.
I think we are very close on the notion of what goes into determining
identity but fairly far apart on where that should be happening. My
preference is to have a reference model that enables that question to be
resolved as fits a particular situation.
Having a syntax/formalism is helpful but only just, since as proposed
(TMCL), it presumes a weakness that is not inherent in the notion of
topic maps.
Why not abstract out the formalism of TMCL that is not based on
remedying that weakness and propose it to the 'reference model'
workshop? I still think one needs a reference model in addition to
defining standard implementations and that would be a step in that
direction.
Hope you are having a great day!
Patrick
--
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model
Topic Maps: Human, not artificial, intelligence at work!