[sc34wg3] TMQL - Unicode as "native character set"?
Patrick Durusau
patrick at durusau.net
Wed Sep 2 15:18:08 EDT 2009
Greetings!
I am not sure what we mean by:
3.1 Relationships to other standards -- where we say in #2:
> The native character set of TMQL shall be Unicode
The reference is to Unicode 3.0 but I assume we mean the current version
of Unicode. Yes?
Shouldn't we also specify an encoding to be supported? Like UTF-8?
Or for that matter, I am not real sure what "native" means. Default?
Subject to specifying some other specific encoding? I don't think we
should limit the data recorded in topic maps to strictly being in UTF-8.
Considering that data we wish to view as a topic map may be in any
number of "native" encodings.
As far as requirements, my suggestion would be the XML character set in
UTF-8 as a default, NFC as the base normalization, with the ability to
declare other encodings, normalizations and collations.
Setting a base line but also allowing applications to compete by their
support for other encodings, normalizations and encodings.
Hope everyone is having a great day!
Patrick
PS: I would suggest that under 4 Requirements for the Language, 4.1
Functionality, where we say:
> TMQL shall support all natural languages equally. That is, TMQL shall
> be fully internationalized with respect to text representation, text
> ordering, etc.
We lose that as a requirement. Or at least define what we mean in some
meaningful way. Such as I have suggested above for defining Unicode
support and normalization required, identification of other
normalizations and collations (Good use for PSIs).
Before anyone protests in favor of internationalization remember that
Unicode now includes Sumerian (listed as Sumero-Akkadian) and while I
would welcome TMQL providing the ability to query strings written in its
base-60 number system, I really don't want to see TMQL delayed until we
define that capacity. Another example, Ugaritic is known to have a
different "native" sort order but is recorded in Unicode using the
modern Hebrew order for similar characters.
Supporting all natural languages *equally* is an excellent ideal but I
would prefer that we enable *others* to support the languages of their
choice.
--
Patrick Durusau
patrick at durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
More information about the sc34wg3
mailing list