[sc34wg3] Almost arbitrary markup in resourceData

Wed, 19 Nov 2003 11:41:45 -0500

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

--------------InterScan_NT_MIME_Boundary
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C3AEBC.04D839F0"

------_=_NextPart_001_01C3AEBC.04D839F0
Content-Type: text/plain

See Below.

Jim Mason

Lars Marius:

What does make me worried about <baseNameString> is two things:

  1) our rationale for allowing XML in <resourceData> is that it's
     equivalent to <resourceRef>, but <baseNameString> really isn't,
     and topic names have no [locator] property,

  2) base names are crucial to all kinds of user interfaces, because
     they provide labels for the topics, and without those you don't
     really have much of a UI. We can have resources as names for
     topics (through variants), but having base names as strings
     ensures that there's always *something* that can be displayed as
     a mere string.

     If we allow markup in here that goes out the door. You may have
     to strip (or, even worse, render) XML markup to be able to label
     your topics.

I'd be interested to hear what people think of this. Should we change our
minds and only do this for <resourceData>?

It was initially baseNameString that I was most interested in corrupting.
resourceData is of less importance to me. 

It may be laziness/ignorance on my part, but baseNameString is what I choose
to display to my users. I see that name (and indeed all names) primarily for
human consumption. Variants, resourceData, are of less interest to me
precisely because baseNameString is what drives my UI. It's true that I'm
working in an environment where I have a lot of control, that I never expect
to receive or transmit an arbitrary TM, so I don't need the fallback of
somewhere having a string that's guaranteed to be raw text.

As I've commented elsewhere in this thread, I don't believe in arbitrary
interchange. I expect there to be an at least implicit DTD for all my data.
So there's never really "almost arbitrary markup" for me, though the markup
may come as a surprise to the topic map engine.

I never believed in name-based merging because, as a linguist, I'm all too
aware of the variability and fragility of names. 

Yes, I need to render XML. For me, the primary problem is that there are
things I need to display that I can't display without additional markup. I
sometimes need to display more than one paragraph. I need subscripts and
superscripts. I need (Oh Horror!) the dreaded <emphasis> tag. I need things
that will require XSLT processing, such as generating labels like "Note:" I
don't want the topic map engine to mess with that stuff, just pass it
through to where the user agent can do whatever it takes to render the
stuff.

Maybe I'm pushing topic maps too hard, but the projects I have in my shop
now generally involve creating portals to collections of information, and
the users want the information displayed in the portal to look like the
information it's the gateway to. My impression is that I'm not alone in
this, that Eric, for one, has similar requirements.

Lars Marius (and Steve N.):

| So, if there's markup in a <baseNameString>, and name-based merging is
| switched on, on what basis will name matching be done?

The equivalence rule for topic name items. We haven't defined it yet in the
presence of markup (will be part of the XML representation proposal), but I
think we'll have to base it on Canonical XML. (From what I gathered from Dan
Connolly, that seems to be what the RDF folks will do, and for the same
reason I propose it: lack of alternatives.)

As I said, I never liked name-based merging. I'd much rather have merging
based on some formalized subject indicator. In one of the maps I'm currently
working on, name-based merging would be absolutely disasterous. I'm mapping
our products and their parts. Several of our products have parts called
"apple", but those parts, though named identically, are wildly different
things. (Yes, I know, I could qualify the apples, and indeed I do scope the
names according to the parent product. But my TMs are generated by scripts
from data that I don't control, and I've had to go back and generate scopes
for names, scopes that aren't in the source data, just to protect myself.)

LMG and SRN:
| I don't like it when things get more complex.  There's gotta be a damn
| good reason.  Jim says he has one, and I take him at his word, but I'd
| be happier if he would explain why <variantName> won't meet his needs,
| [...]

I'd very much like to hear this too. Jim?

This is all dreadfully complex anyway. I hate making the topic map engine
have to do any more work than is necessary, but we can't assume that topic
maps live in spendid isolation from the data they're mapping. Real data is
messy. I'm spending most of my time now trying to unscramble other folks'
data to the point where I can reliably run XSLT scripts on it to generate
TMs that work. I'm about to increase the number of system parts in the TM I
mentioned above by about an order of magnitude. I'm getting this data from
multiple sources, some of them older than a number of members of SC34. It's
really messy. My other project, the one I've talked about at Extreme, is an
interface to a document-management system. When you start talking about
documents, things get really messy (after all, that's why most of us work in
SGML/XML and not in HTML). What more can I say? I can't talk about TMs just
out in TM land. The map is not the territory, but it can't be separated from
the territory, either. I'm a publisher, not an abstract topologist.

------_=_NextPart_001_01C3AEBC.04D839F0
Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">
<TITLE>Message</TITLE>

<META content="MSHTML 6.00.2800.1276" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial color=#0000ff size=2>See Below.<BR><BR>Jim 
Mason</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT>&nbsp;</DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
  <DIV><FONT face=Arial color=#0000ff size=2>Lars Marius:</FONT></DIV><FONT 
  face=Arial color=#0000ff size=2>
  <DIV><BR>What does make me worried about &lt;baseNameString&gt; is two 
  things:<BR><BR>&nbsp; 1) our rationale for allowing XML in 
  &lt;resourceData&gt; is that it's<BR>&nbsp;&nbsp;&nbsp;&nbsp; equivalent to 
  &lt;resourceRef&gt;, but &lt;baseNameString&gt; really 
  isn't,<BR>&nbsp;&nbsp;&nbsp;&nbsp; and topic names have no [locator] 
  property,<BR><BR>&nbsp; 2) base names are crucial to all kinds of user 
  interfaces, because<BR>&nbsp;&nbsp;&nbsp;&nbsp; they provide labels for the 
  topics, and without those you don't<BR>&nbsp;&nbsp;&nbsp;&nbsp; really have 
  much of a UI. We can have resources as names for<BR>&nbsp;&nbsp;&nbsp;&nbsp; 
  topics (through variants), but having base names as 
  strings<BR>&nbsp;&nbsp;&nbsp;&nbsp; ensures that there's always *something* 
  that can be displayed as<BR>&nbsp;&nbsp;&nbsp;&nbsp; a mere 
  string.<BR><BR>&nbsp;&nbsp;&nbsp;&nbsp; If we allow markup in here that goes 
  out the door. You may have<BR>&nbsp;&nbsp;&nbsp;&nbsp; to strip (or, even 
  worse, render) XML markup to be able to label<BR>&nbsp;&nbsp;&nbsp;&nbsp; your 
  topics.<BR><BR>I'd be interested to hear what people think of this. Should we 
  change our minds and only do this for 
&lt;resourceData&gt;?<BR><BR></DIV></BLOCKQUOTE></FONT>
<DIV><FONT face=Arial color=#0000ff size=2>It was initially baseNameString that 
I was most interested in corrupting. resourceData is of less importance to 
me.&nbsp;</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial color=#0000ff size=2>It may be laziness/ignorance on my 
part, but baseNameString is what I choose to display to my users. I see that 
name (and indeed all names) primarily for human consumption. Variants, 
resourceData, are of less interest to me precisely because baseNameString is 
what drives my UI. It's true that I'm working in an environment where I have a 
lot of control, that I never expect to receive or transmit an arbitrary TM, so I 
don't need the fallback of somewhere having a string that's guaranteed to be raw 
text.</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial color=#0000ff size=2>As I've commented elsewhere in this 
thread, I don't believe in arbitrary interchange. I expect there to be an at 
least implicit DTD for all my data. So there's never really "almost arbitrary 
markup" for me, though the markup may come as a surprise to the topic map 
engine.</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial color=#0000ff size=2>I never believed in name-based 
merging because, as a linguist, I'm all too aware of the variability and 
fragility of names. </FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial color=#0000ff size=2>Yes, I need to render XML. For me, 
the primary problem is that there are things I need to display that I can't 
display without additional markup. I sometimes need to display more than one 
paragraph. I need subscripts and superscripts. I need (Oh Horror!) the dreaded 
&lt;emphasis&gt; tag. I need things that will require XSLT processing, such as 
generating labels like "Note:" I don't want the topic map engine to mess with 
that stuff, just pass it through to where the user agent can do whatever it 
takes to render the stuff.</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial color=#0000ff size=2>Maybe I'm pushing topic maps too 
hard, but the projects I have in my shop now generally involve creating portals 
to collections of information, and the users want the information displayed in 
the portal to look like the information it's the gateway to. My impression is 
that I'm not alone in this, that Eric, for one, has similar 
requirements.</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT>&nbsp;</DIV><FONT face=Arial 
color=#0000ff size=2>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
  <DIV dir=ltr style="MARGIN-RIGHT: 0px">Lars Marius (and Steve N.):</DIV>
  <DIV dir=ltr style="MARGIN-RIGHT: 0px"><BR>| So, if there's markup in a 
  &lt;baseNameString&gt;, and name-based merging is<BR>| switched on, on what 
  basis will name matching be done?<BR><BR>The equivalence rule for topic name 
  items. We haven't defined it yet in the presence of markup (will be part of 
  the XML representation proposal), but I think we'll have to base it on 
  Canonical XML. (From what I gathered from Dan Connolly, that seems to be what 
  the RDF folks will do, and for the same reason I propose it: lack of 
  alternatives.)<BR></DIV></BLOCKQUOTE>
<DIV>As I said, I never liked name-based merging. I'd much rather have merging 
based on some formalized subject indicator. In one of the maps I'm currently 
working on, name-based merging would be absolutely disasterous. I'm mapping our 
products and their parts. Several of our products have parts called "apple", but 
those parts, though named identically, are wildly different things. (Yes, I 
know, I could qualify the apples, and indeed I do scope the names according to 
the parent product. But my TMs are generated&nbsp;by scripts&nbsp;from data that 
I don't control, and I've had to go back and generate scopes for names, scopes 
that aren't in the source data, just to protect myself.)</DIV>
<DIV>&nbsp;</DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
  <DIV dir=ltr style="MARGIN-RIGHT: 0px">LMG and SRN:<BR>| I don't like it when 
  things get more complex.&nbsp; There's gotta be a damn<BR>| good reason.&nbsp; 
  Jim says he has one, and I take him at his word, but I'd<BR>| be happier if he 
  would explain why &lt;variantName&gt; won't meet his needs,<BR>| 
  [...]<BR><BR>I'd very much like to hear this too. Jim?</DIV></BLOCKQUOTE>
<DIV dir=ltr style="MARGIN-RIGHT: 0px">This is all dreadfully complex anyway. I 
hate making the topic map engine have to do any more work than is necessary, but 
we can't assume that topic maps live in spendid isolation from the data they're 
mapping. Real data is messy. I'm spending most of my time now trying to 
unscramble other folks' data to the point where I can reliably run XSLT scripts 
on it to generate TMs that work. I'm about to increase the number of system 
parts in the TM I mentioned above by about an order of magnitude. I'm getting 
this data from multiple sources, some of them older than a number of members of 
SC34. It's really messy. My other project, the one I've talked about at Extreme, 
is an interface to a document-management system. When you start talking about 
documents, things get really messy (after all, that's why most of us work in 
SGML/XML and not in HTML). What more can I say? I can't talk about TMs just out 
in TM land. The map is not the territory, but it can't be separated from the 
territory, either. I'm a publisher, not an abstract topologist.</DIV>
<DIV dir=ltr style="MARGIN-RIGHT: 0px">&nbsp;</DIV>
<DIV dir=ltr style="MARGIN-RIGHT: 0px"><BR><BR>&nbsp;</DIV></FONT></BODY></HTML>

------_=_NextPart_001_01C3AEBC.04D839F0--

--------------InterScan_NT_MIME_Boundary--