[sc34wg3] Almost arbitrary markup in resourceData

Freese, Eric D. (LNG-DAY) sc34wg3@isotopicmaps.org
Tue, 11 Nov 2003 10:09:24 -0500


OK, in earlier instances of this thread there was a mention made of user
requirements.  Let me, as a user, explain why LexisNexis and Reed Elsevier
see arbitrary markup added to a LIMITED number of places within XTM (i.e.
names and resourceData) as an improvement to the standard.

One application (presented at Extreme Markup last summer) includes
descriptive data about each of LexisNexis' 32000+ sources.  This data is
more than a sentence or phrase in length.  It is often several paragraphs
long and can include other markup such as lists, chemical, mathematical, or
linking markup.  Under the current standard it is impossible to add any
XML-like markup into the data to preserve the markup - even things as simple
as paragraph breaks.  We looked at making each paragraph a different piece
of resourceData, but there is NO guarantee in the standard that we would get
the pieces back in the same order.  Also, the paragraphs as a whole
represent ONE instance of resourceData, not many, and we didn't see it as
appropriate to have to plit them out.  We looked at storing each of these
descriptions as separate documents, but the overhead of managing so many
extra documents presented possible performance/maintenance issues.  What we
ended up doing was converting the angle brackets in this non-XTM markup into
different characters that then had to be reconverted back to markup so it
could be processed for presentation.  In a business that depends on response
time, extra steps in processing are NOT a good thing.  As you know I am
fairly familiar with the topic map model.  When I have to come up with these
workarounds in order to make a fairly straightforward application work, it
makes my attempts to sell upper management on the usefulness of the topic
map model MUCH more difficult.

As I see it, the content within resourceData and names is specifically for
human consumption, not a topic map processor.  [This could, I believe, be
said of any PCDATA within an XML topic map, but that is another discussion.
When the TCN is not in effect, doesn't all the cool stuff happen in the
attributes?]]  If the standard were changed to allow well-formed XML within
these places, to make it more useable for humans, then I see that as a win.
Any markup outside the XTM namespace is not to be processed by an XTM
processor, just stored/passed through/not touched/whatever as part of the
data - end of requirement.  Do I expect XTM procesors to know what to do
with XHTML?  No.  Do I expect XTM processors to know what to do with
arbitrary XML?  No.  Do I expect XTM processors to process XTM?  Yes.  XTM
is a tool in my toolbox, but I have others that also have specific purposes.
I would like to be able to send the non-XTM (but still XML) data to the
appropriate application and have something useful done with it.  I really
don't think that is muddying up the waters that much.  Also, if a tool DID
try to do something with the additional markup, I'd be highly skeptical.  An
API on the other hand might be a cool thing to differentiate products.  But
I will buy a topic map processor, first and foremost, on its topic map
functionality, not the bells and whistles.

It has also been said that XTM is for interchange.  I agree.  However, the
additional markup in my data represents some of the semantic information of
my data - the same semantics I want to interchange.  If I have to strip this
semantic information out of my data for the sake of meeting some arbitrary
requirement of an interchange standard, it might be considered a limitation
that I don't necessarily need to live with.  Now this "interchange" standard
is telling me what of my semantic information I can and can't interchange.
So is it really a useful interchange standard?  One might wonder.  

I will look for a tool/standard that suits my needs the best.  It's a simple
business decision.  The question before the powers that be in defining the
topic map model is "Which requirements do you support and which ones do you
not?".  From there, the potential user community will decide if this is a
standard that fits their needs, or not.  Again, hopefully, it's a simple
business decision.

Eric Freese
LexisNexis

> -----Original Message-----
> From: sc34wg3-admin@isotopicmaps.org
> [mailto:sc34wg3-admin@isotopicmaps.org]On Behalf Of Murray Altheim
> Sent: Tuesday, November 11, 2003 6:29 AM
> To: sc34wg3@isotopicmaps.org
> Subject: Re: [sc34wg3] Almost arbitrary markup in resourceData
> 
> 
> Lars Marius Garshol wrote:
> [...]
> > What the standard will say very clearly is that XTM processors are
> > expected to take this additional markup and store it, without making
> > any attempt to interpret it in any way. It just goes into the data
> > model instance that is built, and is stored there. The only 
> difference
> > between
> > 
> >   <resourceData>XTM is <em>really</em> cool.</resourceData>
> > 
> > and
> > 
> >   <resourceData><![CDATA[XTM is <em>really</em> 
> cool.]]></resourceData>
> > 
> > will be that in the first case the processor will *know* that it's
> > dealing with XTM, and thus can support transformations of it at
> > publishing time, queries on it, and so on. 
> > 
> > That's it, really. It's meant to be a more convenient way 
> to work with
> > XML content in topic maps, by not forcing people to put one-line XML
> > markup snippets into external files.
> 
> Well, if it's *any* arbitrary XML markup, it's still the same problem.
> And while this may sound obstinant, I'm not modifying my tools to
> handle arbitrary markup.
> 
> > | In my opinion, it's clear not okay, and not cool. The first time
> > | somebody opens up that topic map and sees *nothing* it's not
> > | cool. And which version(s) of XHTML are you going to 
> allow? There's
> > | about five right now.  There will be more, many more. Or 
> can people
> > | put *anything* there?
> > 
> > The answer is indeed *anything*, but none of it will have any topic
> > map semantics.
> >  
> > | I don't buy for a minute the typical W3C argument that 
> you can just
> > | ignore markup you don't understand.
> > 
> > XTM processors are not supposed to ignore this; they are supposed to
> > store it without modifying or interpreting it.
> 
> That is the strangest idea I've heard in a long time. What does
> "store" mean? How does this end up being used? Does "<em>really</em>"
> get stored, leaving "XTM is cool?", or does the proposal mean just
> pulling out the tags so that the original PCDATA is intact? A good
> example of a problem here would be if "really" was "not", where
> "storing" would mean a decided change of meaning.
> 
> > | That's a proprietary hole in XTM, where vendors will be able to
> > | "legally" put proprietary markup that other users will 
> have to play
> > | catch-up to correctly process.
> > 
> > It was the users that pushed for this, not the vendors.
> 
> It doesn't matter who pushed for it. It's still a proprietary hole.
> 
> > | I think the whole thing smells. That's been my opinion of mixed
> > | namespace markup since about 1998, so I suppose there's 
> no surprise.
> > 
> > Well, I agree with you, but we're not allow mixed namespace 
> markup in
> > the usual sense. We're allowing XML content in base names, variant
> > names, and occurrences. That's *not* the same thing.
> 
> Wha? I never even knew about this one. Sounds like I'll just be
> sticking with XTM 1.0. Are you guys sure you're not mucking up
> the works? What are the requirements on the new version? Is
> interchange one of them?
> 
> > | But this in the full knowledge that suddenly there's more than one
> > | XML interchange markup language for topic maps, which simply can't
> > | be a good thing for our community, our vendors, our potential
> > | customers.
> > 
> > We *really* don't want that, but it's not what we are creating,
> > either.
> 
> I don't follow. If you allow arbitrary markup, you're not even
> creating a markup language, you're creating a soup in which
> anything goes, and in which meaning is just as arbitrary as the
> markup (pretty much by definition).
> 
> Murray
> 
> ..............................................................
> .............
> Murray Altheim                         
http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

    Allegations of animal mistreatment against Yukos surfaced in an
    inspection of a farm belonging to a Yukos-affiliated company in
    the Siberian region of Yakutia, news reports said. Male and female
    rabbits were kept together and "couplings take place unsystematically,"
    the Interfax news agency said.
 
http://www.sfgate.com/cgi-bin/article.cgi?file=/news/archive/2003/11/06/inte
rnational1705EST0747.DTL

_______________________________________________
sc34wg3 mailing list
sc34wg3@isotopicmaps.org
http://www.isotopicmaps.org/mailman/listinfo/sc34wg3