[sc34wg3] Added scope allows unmerging?
Lars Marius Garshol
larsga at ontopia.net
Fri Apr 21 11:53:53 EDT 2006
Personally, I believe that this is not true. I'll state the reasons
why I think so, and leave it up to the people who disagree with me to
prove me wrong.
I'll start with a definition of "unmerge" so that we all know what we
are talking about. A successful unmerge means reversing a merge
operation so that the topic map is brought back to the state it was
in before the merge. So if a.xtm includes b.xtm with added scope, the
idea is that an unmerge allows you to reproduce the state of a.xtm as
it would have been without the <mergeMap/> element.
(I know Kal has mentioned "safe Topic Maps aggregation", and that Jim
has brought up the tracking of provenance, etc. However, I think
Kal's safety comes from the ability to unmerge. I also think that
unless you can do the unmerge, you don't really have full tracking of
provenance. So it seems to me that all of these views can safely be
included under the "unmerge" umbrella. Let me know if anyone disagrees.)
--- a) Topics can't be unmerged
That is, if a.xtm includes b.xtm with added scope, and b.xtm causes
two topics in a.xtm to merge, then added scope will not be sufficient
to unmerge b.xtm from a.xtm. If a.xtm contains *only* (in LTM :-)
[a = "A"]
[b = "B"]
and b.xtm contains *only* a topic 'c' that causes 'a' and 'b' to
merge (using <topicRef/> elements, say), then the result after
merging will be:
[a = "A" = "B"
/* plus an item identifier for the 'b' identifier*/
/* plus whatever 'c' added */]
As far as I can tell added scope does not provide any way to go back
to the original state, even though statements added by b.xtm *can* be
removed. Other cases of a similar nature can be constructed.
--- b) Added scope is only for file-based applications
The main use case for added scope is being able to handle updates to
data maintained elsewhere. It's very common in TM applications to
include data from external sources where you want to keep track of
what it was that came from the external source, so that you can
update your topic map accordingly.
The trouble is that added scope in XTM is only of use in this
situation if your topic map is stored in a set of files, and
reimported from the files on each change. There are many such
applications (like OKS Samplers, my own photo application, etc), but
none of them really qualify as real production applications, IMHO.
Real production applications use some kind of persistent,
transactional store. And in this case added scope in XTM does not
help at all. Even if you set up the original system by importing XTM
with added scopes you can't really do the update simply by deleting
everything in the added scope and reimporting, since this will leave
empty topic stubs behind, change all your persistent identifiers, be
very slow, etc.
--- c) If you want this, you can implement it yourself
Let's say that you somehow know that my a) will never occur, and you
don't care that it's slow (maybe you don't have that much data), and
you want to use added scope anyway, but we removed it from XTM 2.0.
What would you have to do to use added scope in persistent store
scenario?
Well, using the OKS it would run like this
// first, delete the old crap
TopicMapIF realtm = getRealRDBMSTopicMap();
TopicIF addedtheme = getTopicForThisDataSource(dataSource);
doTologQueryToRemoveAllStatementsInScope(realtm, addedtheme);
doTologQueryToRemoveTopicStubs(realtm);
// then reimport
TopicMapIF newfragment = ImportExportUtils.getReader(filename).read
();
addTheme(newfragment, addedtheme);
MergeUtils.mergeInto(realtm, newfragment);
// okay, we're done
realtm.getTransaction().commit();
I call out to three helper methods here. Two of them are just tolog
queries (ie: small and simple), and the last would be about 20-25
lines of pretty straightforward Java code.
--- d) Added scope isn't the right way
This is pretty much what Robert Barta wrote in:
http://www.isotopicmaps.org/pipermail/sc34wg3/2006-January/
003095.html
I really agree with his points here. I also have a paper of my own on
this subject that provides a simpler (for the user), more flexible,
more efficient, and more widely applicable solution to the update
problem.
To summarize: this feature doesn't really work, it only covers a very
restricted range of cases, there are better ways to do it, and it's
really simple to implement yourself. Why can't we just drop it from
the standard?
--
Lars Marius Garshol, Ontopian http://www.ontopia.net
+47 98 21 55 50 http://www.garshol.priv.no
More information about the sc34wg3
mailing list