[sc34wg3] Illustrating SIDPs

Tue, 11 May 2004 08:35:13 -0400

Robert,

Robert Barta wrote:
> On Mon, May 10, 2004 at 07:58:04AM -0400, Patrick Durusau wrote:
> 
>>I think yet another confusion is about to bite the dust! Read on:
> 
> 
> Yes, we are getting closer! But the mails are getting longer. :-)
> 
> 
>>The difference with your suggestions based on TMCL, is that the TMRM 
>>would allow the topic map proper, separate and apart from TMCL, to 
>>define such identities. That is to say that identity is defined in the 
>>topic map and not some additional mechanism.
> 
> 
> Well, unfortunately, TMRM does NOT provide any means for expressing
> 'disclosure' (read 'identity determination and merging rules'). It
> only seems to say that all applications have to do it in some way. And
> it is using simply properties (and functions thereof) to express when
> topics should be regarded to be about the same subject.
> 

Well, hard to deny something that was intentional by design and 
announced at every step of the way. ;-) Guilty as charged.

Note that I am admitting only to not providing the "means" as we have 
worked very hard to provide the "rules" that would govern such "means."

> 
>>But the TMRM says that it is precisely that sort of identity that
>>should be able to be captured.
> 
> 
>>>TMRM is 'property' oriented. That is nice and easy for many cases, but this
>>>cannot be regarded as 'arbitrarily complex'.
>>>
>>
>>Here is the source of confusion I alluded to above. When you say 
>>"property," it appears that you are excluding "functions over topic 
>>properties". Yes?
> 
> 
>>The TMRM is using 'property' in the sense that includes "functions over 
>>topic properties", including a topic's participation on various parts of 
>>assertions in the topic map.
> 
> 
> No, no, I understand that it also includes function over properties.
> What disturbes me from an architectural viewpoint is that it has a
> built-in sound barrier.
> 
> Putting topics into equivalence classes can be more general than just
> looking at the properties (or combinations thereof). Not that I would
> expect TMCL to cover things like these, but consider a class-inducing
> rule like:
> 
>  "all topics which are associated with the same number of topics (which are
>   themselves not instances of persons) having an even number of properties not
>   matching particular values .........are to be regarded the same"
> 
> Absurd maybe, but probably not expressible as function of topic
> properties.
> 

OK, you have my curiousity up. Can you say a little more about the 
significance of "(which are not themselves instances of persons)" in the 
context of your example?

I currently read what you have written as a merging rule, which is not 
bounded by a function of topic properties.

> --
> 
> Some people might reasonably argue why we allowed this to happen: a
> TMRM island (with non-formalized half-bridge) and a TM?L island with a
> completely separate, incompatible formalism.  I would not be able to
> argue that.
>

Not sure what you mean here. I don't think separate islands are going to 
be helpful either in development of the topic maps standard or to the 
user community.

> 
>>Formalism would help, but only if we also have agreed upon terms to 
>>describe what appears in the formalism.
> 
> 
> That's my point. The lack of a proper formalism is the source of
> confusion.  We as TM community have - whatever the history might be -
> allowed for too long to work in fluff-space. This costs enormous
> amounts of energy to explain ourselves to each other.  So our
> advantage that most of us are pragmatists also may turn against us.
> 
Just note that I disagree with "fluff-space." Whatever the history, we 
can go forward or chew over past positions. I prefer to find new and 
common understandings.

> 
>>>>The TMRM is not defining a syntax but the rules for a disclosure 
>>>>statement, on which a syntax would be based.
>>>>
>>>>In other words, the TMRM allows disclosure of the basis for identity 
>>>>that must underlie any equivalence or other functions.
>>>
>>>
>>>... of course you say that. But to me this sound as saying "the nice
>>
> 
> This should have been:
> 
>  "Of course you CAN say that."
> 
> --
> 
> 
>>No, to some degree it is a question of 'where' it is done. Why repair 
>>the weakness of a topic map to declare rules for identity by adding 
>>TMCL? Why not simply have a model for topic maps that avoids the 
>>weakness altogether?
> 
> 
> This would mean that we leave identity (and connected merging rules) out
> of the TMRM? I would certainly buy that!
> 

Here we depart company. This may be the question of formalism that we 
are batting back and forth but I think to "leave identity (and connected 
merging rules) out of the TMRM is a serious mistake."

Perhaps we mean different things:

Do you mean that the expression of the rules for identity (and connected 
merging rules) should be left out of the TMRM? (Which you already charge 
to be the case.)

And that leaves the model and requirements for rule for identity (and 
connected merging rules) in the TMRM? (see your next paragraph)

> TMRM would then capture all possible forms topic maps could
> potentially have, without any constraint. TMRM would describe the
> fundamental set of all possible models (in the mathematical sense,
> now).
> 
Have you looked at Neill Kipp's formalism for the TMRM?
http://www.jtc1sc34.org/repository/0441.htm

> TMCL (probably a more primitive version of it) could then remedy the
> lack of notation. TMCL itself would be directed to the end user
> (ontology engineer).
> 
> 
>>Certainly, some people may wish to have topic maps that require the use 
>>of TMCL in order to have the identity rules they require. I don't see 
>>the rationale for forcing that choice on everyone who wants to use topic 
>>maps.
> 
> 
> But according to your model you are forcing everyone into two
> completely incompatible boats. Worse, you force TM vendors into
> supporting a 'disclosure policy' AND supporting a TMCL implementation
> which basically could do the same. The only argument we could have in
> this area is that we say "property based identity and merging is so
> common, that implementors should bias their implementations". But
> supporting two different ways .... I would not implement that.
> 

NO! Not forcing people into incompatible boats, completely incompatible 
or otherwise.

Note, as you have said, the TMRM does not provide a method for 
expressing the rules that it has suggested govern all topic maps. TMCL 
is certainly one way to express those rules.

There is NO, repeat NO, separate implementation requirement. You would 
follow the TMRM model (as you allude to above as "all possible forms") 
at the implementation level in any way you choose.

Note that the TMRM was accused of somehow, it was never clear to me, of 
operating on the level of syntax. Partially at my urging, the TMRM now 
uses the language of "disclosure" and expresses does not say how one 
does that disclosure. It is trying to provide the very model that you 
ask about and has been incorrectly (in my opinion) been said to lead to 
incompatible implementations.

Do not look at the TMRM as a competing implementation strategy and view 
it as a model that sets forth the requirements for the rules that you 
posit for TMCL (well, not complete coverage from what you say). Quite 
honestly I can't see how anyone would view the TMRM as an competing 
implementation strategy.

Yes, there has been a lot of loose talk over the years of the TMRM being 
written about "implementations" of the TMRM. Simply not correct. One can 
implement a topic maps application that follows the rules of the TMRM 
but that is an entirely different thing.

For one thing, the TMRM lacks any specific rules for either identity or 
merging. Simply not there. It specifies, albeit without a lot of clarity 
of expression, the rules for specifying such rules. So, how are you 
going to implement something that is only a set of requirements for 
rules that it does not contain?

> 
>>I read "more formal, explicit way" to say you are designing a
>>syntactical solution.
> 
> 
> No, TMCL will have a syntax (maybe several, because different
> communities are involved) AND it will have to have a semantics. The
> latter will define which TMCL statements for which concrete maps
> (models in the mathematical sense, not in the sense people use here)
> will be true or false (or undecidable if TMCL is too expressive).
> This semantics may be defined in terms of TMRM, but only if it is
> MUCH, MUCH simpler than it is now.
> 

Don't think the TMRM is going to get any simpler.

With your assistance and that of Graham, Dmitry and others, I hope that 
it will be much more clearly expressed than we have managed to date.

Personally I suspect a lot of its perceived complexity is the result of 
its expression. Looking forward to putting that suspicion to the test.

> A similar case can be done for TMQL.
> 
> 
>>Sure, but you have to have some model upon which that syntax is
>>being developed. There is always redundancy between a model and a
>>syntax based upon it but that does not mean that the model is a
>>second class citizen or not needed in some way.
> 
> 
> Obviously, you are using a different concept of 'model'. As I said,
> formalization helps.
> 
Don't think so but am willing to listen.

> 
>>The other problem, as I noted above, is why not allow a topic map to 
>>define the very rules of identity that you want to put into TMCL?
> 
> 
> Just as a clarification: The plan is not to put the identity concept
> into TMCL, but to allow the application developer to express the
> identity with a TMCL statement.
> 
> 
>>Other than following one notion of how topic maps ought to be 
>>implemented, what is the advantage in that approach?
> 
> 
> If it succeeds, then we have ONE formalism to express what a 'TM application'
> is. And not 1 + 0.5 formalisms.
> 

Great, but as you note above, TMCL will have several syntaxes, do you 
have an objection to there being isomorphic mappings to other formalisms 
for expressing what a "TM application" is?

> 
>>NOTE: We should have a standard for implementing topic maps a la XTM. 
>>But, that standard should also have a reference model that allows the 
>>construction of topic maps and topic maps software that do not rely upon 
>>XTM. XTM and its predecessor, HyTM, are, afterall, only interchange 
>>syntaxes that represent a way to exchange topic maps. You can process 
>>topic maps using those syntaxes but that does not mean they define what 
>>it means to be a topic map. (And yes, topic maps based on a reference 
>>model would have topics, associations, occurrences, etc., but also the 
>>robust identity rules that you want to place in TMCL.)
>>
>>
>>
>>>Yes, you are right that we are mixing levels, i.e. using TMCL now for
>>>parts where TMRM has put in a claim. Maybe it is TMCL which has to be
>>>at two levels:
>>>
>>> - level one as a language to define 
>>>
>>>   - what actually properties are in terms of associations. So, for
>>>     instance, a property 'email'
>>>
>>>     $t.email   <=>    $t -> entity \ has-email-address / email
>>>
>>>     (saying that any topic which is involved in a 'has-email-address' 
>>>     association
>>>     with the proper roles can be regarded to have an 'email' property)
>>>
>>>   - what derived properties are:
>>>
>>>     $t.age           <=>   now - $t.born
>>>
>>>   - what derived identity can be:
>>>
>>>     ident ($a, $b)   <=>   $a.email eq $b.email
>>>
>>>   - what identity can also be:
>>>
>>>     ident ($a, $b)   <=>   ...
>>>
>>
>>Rather than saying that you are using TMCL where the TMRM has put in a 
>>claim I would say that TMCL is addressing a weakness in the current 
>>model that the TMRM does not have.
> 
> 
> Mumble :-)
> 
Here as well. :-)

> 
>>Yes, the TMRM deliberately lacks a syntax (at the urging of the WG as I 
>>recall) and nothing in it compells someone to construct the identity 
>>rules entirely in a topic map.
> 
> 
> This sounds like a rather cumbersome process: You would have to use a
> syntactic structure as a topic map to define application specific
> rules. No operators, no quantifiers, all the millions of men-years
> developing logic systems is simply ignored.
> 
I was not the one who urged (insisted?) that the TMRM not have a syntax. 
If you think that was incorrect, you need to take it up with WG3.

Not sure how you make the jump to "all the millions of men-years 
developing logic sytems is simply ignored."

Afterall, you have said that having a "model" (in your sense of the word 
since you seem to think that is different from how I use the word, but 
no proof to that effect) is something on which you would base TMCL, how 
is that different from the TMRM?

If you don't start with a model (your sense), how do you decide what 
rules to write?
Starting with an unspecified model looks like a bad plan to me.

> 
>>You could build something like TMCL to 
>>handle some parts (or all I suppose) of the identity question.
> 
> 
> Yup.
> 
> 
>>Stepping aside from the TMRM for a moment, recall that we discussed in 
>>Amsterdam the need to have a "reference model" as the common basis for 
>>TMCL/TMQL, etc., and have a workshop set for Montreal. What I would 
>>suggest is that a "reference model" that provides the framework for 
>>however one wishes to allocate the resolution of the identity issue is 
>>the goal of that exercise.
> 
> 
> I personally would love to be there, but I can't.
> 
Why not? (serious question, well they all are but I don't want the "why 
not" to sound flippant.)

> 
>>Certainly, anyone can use TMCL to enforce identity rules if they like, 
>>but I have yet to hear an argument that such rules could not properly be 
>>part of a topic map or topic maps software.
> 
> 
> That is a killer argument. Any computable recipe can be put into a
> program of a Turing-equivalent language. But why are then there
> languages like XPath, or OWL? They are convenient, designed for a
> particular job and not Turing equivalent.
> 

Sure, and I was not suggesting that we should not have convenient, etc, 
languages that are not as expressive as the model (your sense) permits.

I really think we are straining to disagree with each other by this point.

> And implementing something in a Java program is not what I would see
> as 'disclosure'. Java programs are information-sinks by definition :-)
> 
> 
>>Well, ultimately in any software process all you can do is compare strings.
> 
> 
> And in ANY formalism you can ultimately only compare symbols.
> 
> 
>>I think we are very close on the notion of what goes into determining 
>>identity but fairly far apart on where that should be happening. My 
>>preference is to have a reference model that enables that question to be 
>>resolved as fits a particular situation.
> 
> 
> I completely agree with the goal....
> 
> 
>>Having a syntax/formalism is helpful but only just, since as proposed 
>>(TMCL), it presumes a weakness that is not inherent in the notion of 
>>topic maps.
>>
>>Why not abstract out the formalism of TMCL that is not based on 
>>remedying that weakness and propose it to the 'reference model' 
>>workshop?
> 
> 
> ....but have no idea what this precisely would mean.
> 
Well, that is why we are having this discussion, correct? ;-)

I had the impression from your first post that you had the view (may 
have been a mistake on my part) that there was a formalism for TMCL that 
is separate from the actual rules you intend to express in TMCL. Was I 
simply mistaken?

> --
> 
> My suggestion is still (modulo some technicalities):
> 
>   (a) to factor out of TMRM the 'topic-mappish' way to represent information,
>     this provides us with all possible forms topic maps could take
> 
>   (b) to recapture the TMRM 'disclosure' requirements and formalize it into
>     a "low-level" TMCL variant
> 
>   (c) to build the (high-level) TMCL semantics either by defining it
>     either directly based on (a) or in terms of (b), or via some other
>     abstraction mechanism.
> 

Sorry, have not seen this before or at least did not read it as you now 
present it.

Suspect we may, due to a lack of formalism (sorry, could not resist), 
disagree on what is contained by "topic-mappish" and "'disclosure' 
requirements" in your (a) and (b), but assume we can profitably discuss 
those as we reach them in specific proposals.

In other words, not sure how much progress we are going to make so long 
as we toss suggestions to each other (I am probably more guilty than 
anyone on the list) and don't start working on specific documents and 
proposals.

Hope you are having a great day!

Patrick

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model

Topic Maps: Human, not artificial, intelligence at work!