[sc34wg3] Illustrating SIDPs

Mon, 10 May 2004 07:58:04 -0400

Robert,

I think yet another confusion is about to bite the dust! Read on:

Robert Barta wrote:
> On Sat, May 08, 2004 at 01:15:48PM -0400, Patrick Durusau wrote:
> 
>>Dmitry wrote:
>>
>>>On May 4, 2004, at 2:30 PM, Patrick Durusau wrote:
>>>
>>>
>>>>SIDPs (and, for that matter, OPs) can be arbitrarily complex.
>>>>
>>>
>>>That is what I cannot find in current TMRM. I see that SIDP can be 
>>>defined as "combination of properties"  which I cannot call "arbitrarily 
>>>complex".
>>>
>>
>>Why did you decide that "combination of properties" does not equal 
>>"arbitrarily complex?"
>>
>>Would you prefer "arbitrary combination of properties?"
>>
>>Curious because that "combination of properties" is understood (by me at 
>>any rate) to mean "arbitrarily complex."
> 
> 
> Patrick,
> 
> Obviously we reach here an area where - without more formalisation - confusion
> is imminent.
> 
> If I hear "arbitrary combination of properties" then I would think that there
> are properties p1, ..., pn and the combination is defined as function f operating
> on these properties, returning a new, combined 'property':
> 
>    p_new = f (p1, ..., pn)
> 
> In your earlier example f would simply concatenate two existing properties
> of a topic:
> 
> 
>>>    * Property name:              "personID"
>>>
>>>    * Value type:                 complex:
>>>                                     "name"   : string
>>>                                     "spouse" : topic
>>>
>>>    * SIDP or OP?:                SIDP
>>
> 
> So the equivalence is induced by the function values which are built
> from - more primitive values. This is all good and well, but...
> 
> 
>>Dmitry said:
>>
>>>TMCL on the other hand can express any equivalence function. I just 
>>>think that TMCL is more general and powerful approach for defining 
>>>equivalence classes. "Combination of properties" is a subset of possible 
>>>equivalence functions.
>>
> 
> ...with functions over topic properties (and the primitive identity on
> symbols =strings)) you cannot handle application-specific constraints
> as TMCL would allow you to:
> 
> What about an identity which is induced by the fact that two topics
> should be regarded the same if both are "involved in the same kind and
> number of criminal activities"? (Not saying that TMCL should be able
> to capture this!)
> 

But the TMRM says that it is precisely that sort of identity that should 
be able to be captured.

The difference with your suggestions based on TMCL, is that the TMRM 
would allow the topic map proper, separate and apart from TMCL, to 
define such identities. That is to say that identity is defined in the 
topic map and not some additional mechanism.

> TMRM is 'property' oriented. That is nice and easy for many cases, but this
> cannot be regarded as 'arbitrarily complex'.
> 

Here is the source of confusion I alluded to above. When you say 
"property," it appears that you are excluding "functions over topic 
properties". Yes?

The TMRM is using 'property' in the sense that includes "functions over 
topic properties", including a topic's participation on various parts of 
assertions in the topic map.

Formalism would help, but only if we also have agreed upon terms to 
describe what appears in the formalism.

> Now, ....
> 
> 
>>There is no other hand. The TMRM and TMCL are addressing completely 
>>different levels of the topic map paradigm.
>>
>>The TMRM is not defining a syntax but the rules for a disclosure 
>>statement, on which a syntax would be based.
>>
>>In other words, the TMRM allows disclosure of the basis for identity 
>>that must underlie any equivalence or other functions.
> 
> 
> ... of course you say that. But to me this sound as saying "the nice
> and easy things we can capture with properties and functions on
> properties - while we do not say how it is done" and "the other stuff
> is done with TMCL".
> 

No, to some degree it is a question of 'where' it is done. Why repair 
the weakness of a topic map to declare rules for identity by adding 
TMCL? Why not simply have a model for topic maps that avoids the 
weakness altogether?

Certainly, some people may wish to have topic maps that require the use 
of TMCL in order to have the identity rules they require. I don't see 
the rationale for forcing that choice on everyone who wants to use topic 
maps.

>>From an standard engineering point of view one might argue that there
> is so much overlap in the task to define an identity concept, that it
> is worth exploring the redundancy.
> 
> Now, the fact that TMCL is obviously going to be well-founded and uses
> a more formal, explicit way to impose structural constraints on maps
> (and identity is just that) might raise the question how TMRM fits
> into the picture.
> 

I read "more formal, explicit way" to say you are designing a 
syntactical solution. Sure, but you have to have some model upon which 
that syntax is being developed. There is always redundancy between a 
model and a syntax based upon it but that does not mean that the model 
is a second class citizen or not needed in some way.

The other problem, as I noted above, is why not allow a topic map to 
define the very rules of identity that you want to put into TMCL? Isn't 
that an implementation decision?

Other than following one notion of how topic maps ought to be 
implemented, what is the advantage in that approach?

NOTE: We should have a standard for implementing topic maps a la XTM. 
But, that standard should also have a reference model that allows the 
construction of topic maps and topic maps software that do not rely upon 
XTM. XTM and its predecessor, HyTM, are, afterall, only interchange 
syntaxes that represent a way to exchange topic maps. You can process 
topic maps using those syntaxes but that does not mean they define what 
it means to be a topic map. (And yes, topic maps based on a reference 
model would have topics, associations, occurrences, etc., but also the 
robust identity rules that you want to place in TMCL.)

> 
>>Mixing levels again. The TMRM, using SIDPs, enables the disclosure of 
>>identity that must be present for any such rules to make sense. And yes, 
>>you can then disclose the equivalence (or any other rule) that you like.
> 
> 
> I think what Dmitry is after (and I would support that) is that
> identity at the most fundamental level is simply done by the identity
> of symbols, so strings, and the derived ones on URIs. From that you
> would derive identity based on source location, etc.
> 
> Yes, you are right that we are mixing levels, i.e. using TMCL now for
> parts where TMRM has put in a claim. Maybe it is TMCL which has to be
> at two levels:
> 
>   - level one as a language to define 
> 
>     - what actually properties are in terms of associations. So, for
>       instance, a property 'email'
> 
>       $t.email   <=>    $t -> entity \ has-email-address / email
> 
>       (saying that any topic which is involved in a 'has-email-address' association
>       with the proper roles can be regarded to have an 'email' property)
> 
>     - what derived properties are:
> 
>       $t.age           <=>   now - $t.born
> 
>     - what derived identity can be:
> 
>       ident ($a, $b)   <=>   $a.email eq $b.email
> 
>     - what identity can also be:
> 
>       ident ($a, $b)   <=>   ...
> 

Rather than saying that you are using TMCL where the TMRM has put in a 
claim I would say that TMCL is addressing a weakness in the current 
model that the TMRM does not have.

Yes, the TMRM deliberately lacks a syntax (at the urging of the WG as I 
recall) and nothing in it compells someone to construct the identity 
rules entirely in a topic map. You could build something like TMCL to 
handle some parts (or all I suppose) of the identity question.

Stepping aside from the TMRM for a moment, recall that we discussed in 
Amsterdam the need to have a "reference model" as the common basis for 
TMCL/TMQL, etc., and have a workshop set for Montreal. What I would 
suggest is that a "reference model" that provides the framework for 
however one wishes to allocate the resolution of the identity issue is 
the goal of that exercise.

Certainly, anyone can use TMCL to enforce identity rules if they like, 
but I have yet to hear an argument that such rules could not properly be 
part of a topic map or topic maps software.

>   - level two captures the other things covered in the TMCL use cases.
> 
> [ Merging is handled in general as a function M which maps map1, map2 into maps.
> This could be TMQL, btw. ]
> 
> 
>>But wait:
>>
>>P1 (carol (wife of patrick), bambi (deer in Walt Disney movie), clarence 
>>(patrick's dog), and
>>
>>P2 (carol (as to sing), bambi (a stripper), clarence (ghost in "It's a 
>>Wonderful Life," a movie).
> 
> ...
> 
>>Answer: Well, TMCL presumes a doctrine of identity (that should be 
>>defined elsewhere) that allows it to make meaningful comparison of the 
>>members of each ordered set.
> 
> 
> Yes, and this doctrine can be that of string equivalence.
> 
> 
>>The notion that identity and doctrines of identity can be
>>assumed/presumed is quite surprising to me .....
> 
> 
> That is not what I (or also probably Dmitry) say: My conjecture is
> that we can found ALL concepts eventually on the identity of
> strings.
> 

Well, ultimately in any software process all you can do is compare strings.

I think we are very close on the notion of what goes into determining 
identity but fairly far apart on where that should be happening. My 
preference is to have a reference model that enables that question to be 
resolved as fits a particular situation.

Having a syntax/formalism is helpful but only just, since as proposed 
(TMCL), it presumes a weakness that is not inherent in the notion of 
topic maps.

Why not abstract out the formalism of TMCL that is not based on 
remedying that weakness and propose it to the 'reference model' 
workshop? I still think one needs a reference model in addition to 
defining standard implementations and that would be a step in that 
direction.

Hope you are having a great day!

Patrick

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model

Topic Maps: Human, not artificial, intelligence at work!