[sc34wg3] Added scope allows unmerging?

Fri Apr 21 11:53:53 EDT 2006

Personally, I believe that this is not true. I'll state the reasons  
why I think so, and leave it up to the people who disagree with me to  
prove me wrong.

I'll start with a definition of "unmerge" so that we all know what we  
are talking about. A successful unmerge means reversing a merge  
operation so that the topic map is brought back to the state it was  
in before the merge. So if a.xtm includes b.xtm with added scope, the  
idea is that an unmerge allows you to reproduce the state of a.xtm as  
it would have been without the <mergeMap/> element.

(I know Kal has mentioned "safe Topic Maps aggregation", and that Jim  
has brought up the tracking of provenance, etc. However, I think  
Kal's safety comes from the ability to unmerge. I also think that  
unless you can do the unmerge, you don't really have full tracking of  
provenance. So it seems to me that all of these views can safely be  
included under the "unmerge" umbrella. Let me know if anyone disagrees.)

--- a) Topics can't be unmerged

That is, if a.xtm includes b.xtm with added scope, and b.xtm causes  
two topics in a.xtm to merge, then added scope will not be sufficient  
to unmerge b.xtm from a.xtm. If a.xtm contains *only* (in LTM :-)

   [a = "A"]
   [b = "B"]

and b.xtm contains *only* a topic 'c' that causes 'a' and 'b' to  
merge (using <topicRef/> elements, say), then the result after  
merging will be:

   [a = "A" = "B"
    /* plus an item identifier for the 'b' identifier*/
    /* plus whatever 'c' added */]

As far as I can tell added scope does not provide any way to go back  
to the original state, even though statements added by b.xtm *can* be  
removed. Other cases of a similar nature can be constructed.

--- b) Added scope is only for file-based applications

The main use case for added scope is being able to handle updates to  
data maintained elsewhere. It's very common in TM applications to  
include data from external sources where you want to keep track of  
what it was that came from the external source, so that you can  
update your topic map accordingly.

The trouble is that added scope in XTM is only of use in this  
situation if your topic map is stored in a set of files, and  
reimported from the files on each change. There are many such  
applications (like OKS Samplers, my own photo application, etc), but  
none of them really qualify as real production applications, IMHO.

Real production applications use some kind of persistent,  
transactional store. And in this case added scope in XTM does not  
help at all. Even if you set up the original system by importing XTM  
with added scopes you can't really do the update simply by deleting  
everything in the added scope and reimporting, since this will leave  
empty topic stubs behind, change all your persistent identifiers, be  
very slow, etc.

--- c) If you want this, you can implement it yourself

Let's say that you somehow know that my a) will never occur, and you  
don't care that it's slow (maybe you don't have that much data), and  
you want to use added scope anyway, but we removed it from XTM 2.0.  
What would you have to do to use added scope in persistent store  
scenario?

Well, using the OKS it would run like this

   // first, delete the old crap
   TopicMapIF realtm = getRealRDBMSTopicMap();
   TopicIF addedtheme = getTopicForThisDataSource(dataSource);
   doTologQueryToRemoveAllStatementsInScope(realtm, addedtheme);
   doTologQueryToRemoveTopicStubs(realtm);

   // then reimport
   TopicMapIF newfragment = ImportExportUtils.getReader(filename).read 
();
   addTheme(newfragment, addedtheme);
   MergeUtils.mergeInto(realtm, newfragment);

   // okay, we're done
   realtm.getTransaction().commit();

I call out to three helper methods here. Two of them are just tolog  
queries (ie: small and simple), and the last would be about 20-25  
lines of pretty straightforward Java code.

--- d) Added scope isn't the right way

This is pretty much what Robert Barta wrote in:
   http://www.isotopicmaps.org/pipermail/sc34wg3/2006-January/ 
003095.html

I really agree with his points here. I also have a paper of my own on  
this subject that provides a simpler (for the user), more flexible,  
more efficient, and more widely applicable solution to the update  
problem.

To summarize: this feature doesn't really work, it only covers a very  
restricted range of cases, there are better ways to do it, and it's  
really simple to implement yourself. Why can't we just drop it from  
the standard?

--
Lars Marius Garshol, Ontopian               http://www.ontopia.net
+47 98 21 55 50                             http://www.garshol.priv.no