[sc34wg3] Almost arbitrary markup in resourceData
Mason, James David (MXM)
sc34wg3@isotopicmaps.org
Wed, 19 Nov 2003 11:41:45 -0500
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
--------------InterScan_NT_MIME_Boundary
Content-Type: multipart/alternative;
boundary="----_=_NextPart_001_01C3AEBC.04D839F0"
------_=_NextPart_001_01C3AEBC.04D839F0
Content-Type: text/plain
See Below.
Jim Mason
Lars Marius:
What does make me worried about <baseNameString> is two things:
1) our rationale for allowing XML in <resourceData> is that it's
equivalent to <resourceRef>, but <baseNameString> really isn't,
and topic names have no [locator] property,
2) base names are crucial to all kinds of user interfaces, because
they provide labels for the topics, and without those you don't
really have much of a UI. We can have resources as names for
topics (through variants), but having base names as strings
ensures that there's always *something* that can be displayed as
a mere string.
If we allow markup in here that goes out the door. You may have
to strip (or, even worse, render) XML markup to be able to label
your topics.
I'd be interested to hear what people think of this. Should we change our
minds and only do this for <resourceData>?
It was initially baseNameString that I was most interested in corrupting.
resourceData is of less importance to me.
It may be laziness/ignorance on my part, but baseNameString is what I choose
to display to my users. I see that name (and indeed all names) primarily for
human consumption. Variants, resourceData, are of less interest to me
precisely because baseNameString is what drives my UI. It's true that I'm
working in an environment where I have a lot of control, that I never expect
to receive or transmit an arbitrary TM, so I don't need the fallback of
somewhere having a string that's guaranteed to be raw text.
As I've commented elsewhere in this thread, I don't believe in arbitrary
interchange. I expect there to be an at least implicit DTD for all my data.
So there's never really "almost arbitrary markup" for me, though the markup
may come as a surprise to the topic map engine.
I never believed in name-based merging because, as a linguist, I'm all too
aware of the variability and fragility of names.
Yes, I need to render XML. For me, the primary problem is that there are
things I need to display that I can't display without additional markup. I
sometimes need to display more than one paragraph. I need subscripts and
superscripts. I need (Oh Horror!) the dreaded <emphasis> tag. I need things
that will require XSLT processing, such as generating labels like "Note:" I
don't want the topic map engine to mess with that stuff, just pass it
through to where the user agent can do whatever it takes to render the
stuff.
Maybe I'm pushing topic maps too hard, but the projects I have in my shop
now generally involve creating portals to collections of information, and
the users want the information displayed in the portal to look like the
information it's the gateway to. My impression is that I'm not alone in
this, that Eric, for one, has similar requirements.
Lars Marius (and Steve N.):
| So, if there's markup in a <baseNameString>, and name-based merging is
| switched on, on what basis will name matching be done?
The equivalence rule for topic name items. We haven't defined it yet in the
presence of markup (will be part of the XML representation proposal), but I
think we'll have to base it on Canonical XML. (From what I gathered from Dan
Connolly, that seems to be what the RDF folks will do, and for the same
reason I propose it: lack of alternatives.)
As I said, I never liked name-based merging. I'd much rather have merging
based on some formalized subject indicator. In one of the maps I'm currently
working on, name-based merging would be absolutely disasterous. I'm mapping
our products and their parts. Several of our products have parts called
"apple", but those parts, though named identically, are wildly different
things. (Yes, I know, I could qualify the apples, and indeed I do scope the
names according to the parent product. But my TMs are generated by scripts
from data that I don't control, and I've had to go back and generate scopes
for names, scopes that aren't in the source data, just to protect myself.)
LMG and SRN:
| I don't like it when things get more complex. There's gotta be a damn
| good reason. Jim says he has one, and I take him at his word, but I'd
| be happier if he would explain why <variantName> won't meet his needs,
| [...]
I'd very much like to hear this too. Jim?
This is all dreadfully complex anyway. I hate making the topic map engine
have to do any more work than is necessary, but we can't assume that topic
maps live in spendid isolation from the data they're mapping. Real data is
messy. I'm spending most of my time now trying to unscramble other folks'
data to the point where I can reliably run XSLT scripts on it to generate
TMs that work. I'm about to increase the number of system parts in the TM I
mentioned above by about an order of magnitude. I'm getting this data from
multiple sources, some of them older than a number of members of SC34. It's
really messy. My other project, the one I've talked about at Extreme, is an
interface to a document-management system. When you start talking about
documents, things get really messy (after all, that's why most of us work in
SGML/XML and not in HTML). What more can I say? I can't talk about TMs just
out in TM land. The map is not the territory, but it can't be separated from
the territory, either. I'm a publisher, not an abstract topologist.
------_=_NextPart_001_01C3AEBC.04D839F0
Content-Type: text/html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">
<TITLE>Message</TITLE>
<META content="MSHTML 6.00.2800.1276" name=GENERATOR></HEAD>
<BODY><!-- Converted from text/plain format -->
<DIV><FONT face=Arial color=#0000ff size=2>See Below.<BR><BR>Jim
Mason</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV><FONT face=Arial color=#0000ff size=2>Lars Marius:</FONT></DIV><FONT
face=Arial color=#0000ff size=2>
<DIV><BR>What does make me worried about <baseNameString> is two
things:<BR><BR> 1) our rationale for allowing XML in
<resourceData> is that it's<BR> equivalent to
<resourceRef>, but <baseNameString> really
isn't,<BR> and topic names have no [locator]
property,<BR><BR> 2) base names are crucial to all kinds of user
interfaces, because<BR> they provide labels for the
topics, and without those you don't<BR> really have
much of a UI. We can have resources as names for<BR>
topics (through variants), but having base names as
strings<BR> ensures that there's always *something*
that can be displayed as<BR> a mere
string.<BR><BR> If we allow markup in here that goes
out the door. You may have<BR> to strip (or, even
worse, render) XML markup to be able to label<BR> your
topics.<BR><BR>I'd be interested to hear what people think of this. Should we
change our minds and only do this for
<resourceData>?<BR><BR></DIV></BLOCKQUOTE></FONT>
<DIV><FONT face=Arial color=#0000ff size=2>It was initially baseNameString that
I was most interested in corrupting. resourceData is of less importance to
me. </FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2>It may be laziness/ignorance on my
part, but baseNameString is what I choose to display to my users. I see that
name (and indeed all names) primarily for human consumption. Variants,
resourceData, are of less interest to me precisely because baseNameString is
what drives my UI. It's true that I'm working in an environment where I have a
lot of control, that I never expect to receive or transmit an arbitrary TM, so I
don't need the fallback of somewhere having a string that's guaranteed to be raw
text.</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2>As I've commented elsewhere in this
thread, I don't believe in arbitrary interchange. I expect there to be an at
least implicit DTD for all my data. So there's never really "almost arbitrary
markup" for me, though the markup may come as a surprise to the topic map
engine.</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2>I never believed in name-based
merging because, as a linguist, I'm all too aware of the variability and
fragility of names. </FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2>Yes, I need to render XML. For me,
the primary problem is that there are things I need to display that I can't
display without additional markup. I sometimes need to display more than one
paragraph. I need subscripts and superscripts. I need (Oh Horror!) the dreaded
<emphasis> tag. I need things that will require XSLT processing, such as
generating labels like "Note:" I don't want the topic map engine to mess with
that stuff, just pass it through to where the user agent can do whatever it
takes to render the stuff.</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2>Maybe I'm pushing topic maps too
hard, but the projects I have in my shop now generally involve creating portals
to collections of information, and the users want the information displayed in
the portal to look like the information it's the gateway to. My impression is
that I'm not alone in this, that Eric, for one, has similar
requirements.</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV><FONT face=Arial
color=#0000ff size=2>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV dir=ltr style="MARGIN-RIGHT: 0px">Lars Marius (and Steve N.):</DIV>
<DIV dir=ltr style="MARGIN-RIGHT: 0px"><BR>| So, if there's markup in a
<baseNameString>, and name-based merging is<BR>| switched on, on what
basis will name matching be done?<BR><BR>The equivalence rule for topic name
items. We haven't defined it yet in the presence of markup (will be part of
the XML representation proposal), but I think we'll have to base it on
Canonical XML. (From what I gathered from Dan Connolly, that seems to be what
the RDF folks will do, and for the same reason I propose it: lack of
alternatives.)<BR></DIV></BLOCKQUOTE>
<DIV>As I said, I never liked name-based merging. I'd much rather have merging
based on some formalized subject indicator. In one of the maps I'm currently
working on, name-based merging would be absolutely disasterous. I'm mapping our
products and their parts. Several of our products have parts called "apple", but
those parts, though named identically, are wildly different things. (Yes, I
know, I could qualify the apples, and indeed I do scope the names according to
the parent product. But my TMs are generated by scripts from data that
I don't control, and I've had to go back and generate scopes for names, scopes
that aren't in the source data, just to protect myself.)</DIV>
<DIV> </DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV dir=ltr style="MARGIN-RIGHT: 0px">LMG and SRN:<BR>| I don't like it when
things get more complex. There's gotta be a damn<BR>| good reason.
Jim says he has one, and I take him at his word, but I'd<BR>| be happier if he
would explain why <variantName> won't meet his needs,<BR>|
[...]<BR><BR>I'd very much like to hear this too. Jim?</DIV></BLOCKQUOTE>
<DIV dir=ltr style="MARGIN-RIGHT: 0px">This is all dreadfully complex anyway. I
hate making the topic map engine have to do any more work than is necessary, but
we can't assume that topic maps live in spendid isolation from the data they're
mapping. Real data is messy. I'm spending most of my time now trying to
unscramble other folks' data to the point where I can reliably run XSLT scripts
on it to generate TMs that work. I'm about to increase the number of system
parts in the TM I mentioned above by about an order of magnitude. I'm getting
this data from multiple sources, some of them older than a number of members of
SC34. It's really messy. My other project, the one I've talked about at Extreme,
is an interface to a document-management system. When you start talking about
documents, things get really messy (after all, that's why most of us work in
SGML/XML and not in HTML). What more can I say? I can't talk about TMs just out
in TM land. The map is not the territory, but it can't be separated from the
territory, either. I'm a publisher, not an abstract topologist.</DIV>
<DIV dir=ltr style="MARGIN-RIGHT: 0px"> </DIV>
<DIV dir=ltr style="MARGIN-RIGHT: 0px"><BR><BR> </DIV></FONT></BODY></HTML>
------_=_NextPart_001_01C3AEBC.04D839F0--
--------------InterScan_NT_MIME_Boundary--