ISO/IEC JTC1/SC34
Title: | Topic Maps — Canonicalization |
---|---|
Source: | Jaeho Lee, Lars Marius Garshol, Motomu Naito, JTC1 / SC34 |
Project: | ISO 13250: Topic Maps |
Project editor: | Jaeho Lee, Lars Marius Garshol, Motomu Naito |
Status: | |
Action: | For review |
Date: | 2009-01-27 |
Summary: | |
Distribution: | SC34 and Liaisons |
Refer to: | http://www.isotopicmaps.org/cxtm/2009-01-27/ |
Supercedes: | http://www.isotopicmaps.org/cxtm/2008-05-15/ |
Reply to: | Dr. James David Mason (ISO/IEC JTC1/SC34 Chairman) Y-12 National Security Complex Information Technology Services Bldg. 9113 M.S. 8208 Oak Ridge, TN 37831-8208 U.S.A. Telephone: +1 865 574-6973 Facsimile: +1 865 574-1896 E-mail: mailto:mxm@y12.doe.gov http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm Mr. G. Ken Holman (ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada) Crane Softwrights Ltd. Box 266, Kars, ON K0A-2E0 CANADA Telephone: +1 613 489-0999 Facsimile: +1 613 489-0995 Network: jtc1sc34@scc.ca |
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
ISO/IEC 13250-4 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information Technology, Subcommittee SC 34, Document Description and Processing Languages.
ISO/IEC 13250 consists of the following parts, under the general title Topic Maps:
This part of ISO/IEC13250 defines a format known as Canonical XTM, or CXTM for short. The format is an XML format, and has the property that it guarantees that two equivalent Topic Maps Data Model instances [ISO/IEC 13250-2] will always produce byte-by-byte identical serializations, and that non-equivalent instances will always produce different serializations. CXTM thus enables direct comparison of two topic maps to determine equality by comparison of their canonical serializations.
The purpose of CXTM is to allow the creation of test suites for various Topic Maps-related technologies that are easily portable between different Topic Maps implementations, so long as these support CXTM.
CXTM is not intended to be used for the interchange of topic maps, although this is possible. The standard format for interchange of topic maps is XTM [ISO/IEC 13250-3].
This part of ISO/IEC13250 defines the CXTM format, and specifies how CXTM files are produces from topic maps by means of a transformation from the Topic Maps Data Model [ISO/IEC 13250-2] to the XML Infoset [XML Infoset].
The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
Each of the following documents has a unique identifier that is used to cite the document in the text. The unique identifier consists of the part of the reference up to the first comma.
XML-C14N, Canonical XML, Version 1.0, World Wide Web Consortium, 15th March 2001, http://www.w3.org/TR/2001/REC-xml-c14n-20010315
XML Infoset, XML Information Set (Second Edition), World Wide Web Consortium, 4 February 2004, http://www.w3.org/TR/2004/REC-xml-infoset-20040204
ISO/IEC 13250-2, ISO/IEC 13250-2 Information technology — Topic Maps — Data model, http://www.isotopicmaps.org/sam/
XMLSCHEMA-2, XML Schema Part 2: Datatypes Second Edition, World Wide Web Consortium, 28 October 2004, http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/
The canonicalization process takes two parameters: a topic map item (that is, an instance of the Topic Maps Data Model, defined in [ISO/IEC 13250-2]) and a base locator. The process produces a canonicalization of the topic map, with all locators in the topic map rewritten to be relative to the given base locator. The purpose of the base locator is to allow references to the local filesystem to be stripped out, thus making CXTM test cases portable between different systems.
Canonicalization is performed in three steps:
A document information item representing the CXTM document is produced from the topic map item as described in 3.3.
For each element information item that is a descendant of the document information item from the previous step, the following operations are performed:
A character information item is added to the [[children]] property of the information item in the element's [[parent]] property immediately after the element itself. The character information item's [[character code]] property is set to #x0A.
If the element's [[local name]] property is set to "topicMap", "topic", "name", "variant", "occurrence", "association", "role", "scope", "itemIdentifiers", "subjectLocators", or "subjectIdentifiers", a character information item is added to the [[children]] property of the element as the first element. The character information item's [[character code]] property is set to #x0A.
The document information item is serialized to a Canonical XML representation as described in [XML-C14N].
Information item properties from [W3C XML-Infoset] are referred to using [[property name]], in order to distinguish them from properties from [ISO/IEC 13250-2].
There is exactly one CXTM document information item in the XML Infoset generated by the canonicalization of the topic map item.
The CXTM document information item has the following properties:
[[children]] A list containing only the representation of the topic map item
[[document element]] The element information item that represents the topic map item
[[notations]] The empty set
[[unparsed entities]] The empty set
[[base URI]] No value
[[standalone]] No value
[[version]] No value
[[all declarations processed]] False
A topic map item is represented by an element information item with the following properties:
[[local name]] The string "topicMap"
[[children]] A list of element information items in the following order:
A representation of the [item identifiers] property, if any
A representation of each topic item in the [topics] property of the topic map item in canonical sort order
A representation of each association item in the [associations] property of the topic map item in canonical sort order
[[attributes]] A representation of the [reifier] property
A topic item is represented by an element information item with the following properties:
[[local name]] The string "topic"
[[children]] A list of element information items in the following order:
If the value of [subject identifiers] property of the topic item is not the empty set, then an element information item with the following properties:
[[local name]] The string "subjectIdentifiers"
[[children]] A representation of each locator in the [subject identifiers] property in canonical sort order
[[attributes]] The empty set
If the value of the [subject locators] property of the topic item is not the empty set, then an element information item with the following properties:
[[local name]] The string "subjectLocators"
[[children]] A representation of each locator in the [subject locators] property in canonical sort order
[[attributes]] The empty set
A representation of the [item identifiers] property, if any
A representation of each of the topic name items of the [topic names] property in canonical sort order
A representation of each of the occurrence items of the [occurrences] property in canonical sort order
For each of the association role items of the [roles played] property in canonical sort order, an element information item with the following properties
[[local name]] set to the string "rolePlayed"
[[children]] An empty list
[[attributes]] A set containing one attribute information item as follows:
[[local name]] set to the string "ref"
[[normalized value]] A sequence of character information items representing a string value constructed by the concatenation of:
The string "association."
The position of the association item which is the value of the [parent] property of the association role item, in the canonically sorted [associations] property of the parent topic map item
The string ".role."
The position of the association role item in the canonically sorted [roles] property of the parent association item
[[attributes]] A set containing the number attribute for this information item.
Each topic name item is represented by an element information item with the following properties:
[[local name]] The string "name"
[[children]] A list of element information items in the following order:
A representation of the [value] property
A representation of the [type] property
A representation of the [scope] property
A representation of each of the variant items in the [variants] property in canonical sort order
A representation of the [item identifiers] property, if any
[[attributes]] The union of:
A representation of the [reifier] property
The number attribute for this information item
A variant item is represented by an element information item with the following properties:
[[local name]] The string "variant"
[[children]] A list of element information items in the following order:
A representation of the [value] property
A representation of the [datatype] property
A representation of the [scope] property
A representation of the [item identifiers] property, if any
[[attributes]] The union of:
A representation of the [reifier] property
The number attribute for this information item
An occurrence item is represented by an element information item with the following properties:
[[local name]] The string "occurrence"
[[children]] A list of element information items in the following order:
A representation of the [value] property
A representation of the the [datatype] property
A representation of the [type] property
A representation of the [scope] property
A representation of the [item identifiers] property, if any
[[attributes]] The union of:
A representation of the [reifier] property
The number attribute for this information item
An association item is represented by an element information item with the following properties:
[[local name]] The string "association"
[[children]] A list of element information items in the following order:
A representation of the [type] property
A representation of each of the items of the [roles] property in canonical sort order
A representation of the [scope] property
A representation of the [item identifiers] property, if any
[[attributes]] The union of:
A representation of the [reifier] property
The number attribute for this information item
An association role item is represented by an element information item with the following properties:
[[local name]] The string "role"
[[children]] A list of element information items in the following order:
An element information item with the following properties:
[[local name]] The string "player"
[[children]] The empty list
[[attributes]] A set of one attribute information item with the following properties:
[[local name]] The string "topicref"
[[normalized value]] The position of the topic item in the [player] property within the canonically sorted [topics] property of the parent topic map item
A representation of the [type] property
A representation of the [item identifiers] property, if any
[[attributes]] The union of:
A representation of the [reifier] property
The number attribute for this information item
If the [reifier] property of an information item is null it is represented by the empty set. Otherwise it is represented as a set containing an attribute information item with the following properties:
[[local name]] The string "reifier"
[[normalized value]] The position of the topic item that is the value of the [reifier] property in the canonically sorted list of all topic items
If the [scope] property of an information item is the empty set, then it has no representation. Otherwise it is represented by an element information item with the following properties:
[[local name]] The string "scope"
[[children]] A list of one element information item for each topic item in the value of the [scope] property in canonical sort order. Each element information item has the following properties:
[[local name]] The string "scopingTopic"
[[children]] An empty list
[[attributes]] A list containing a single attribute information item with the following properties:
[[local name]] The string "topicref"
[[normalized value]] The position of the topic item within the canonically sorted list of all topic items in the topic map item being canonicalized
[[attributes]] The empty set
If the [item identifiers] property of an information item is the empty set it has no representation. Otherwise it is represented by an element information item with the following properties:
[[local name]] The string "itemIdentifiers"
[[children]] A representation of each locator in the [item identifiers] property in canonical sort order
[[attributes]] The empty set
The [datatype] property of an information item is represented by an element information item with the following properties:
[[local name]] The string "datatype"
[[children]] A sequence of character information items representing the string value of the (non-normalized) locator in the [datatype] property
[[attributes]] The empty set
The [type] property of an information item is represented by an element information item with the following properties:
[[local name]] The string "type"
[[children]] An empty list
[[attributes]] A set containing an attribute information item with the following properties:
[[local name]] The string "topicref"
[[normalized value]] The position of the topic item that is the value of the [type] property within the canonically sorted list of all topic items in the Topic Maps Data Model being encoded.
A [value] property of an information item is represented by an element information item with the following properties:
[[local name]] The string "value"
[[children]] A sequence of character information items corresponding to the string representation of the [value] property, as defined below.
[[attributes]] The empty set
The string representation of the [value] property depends on the [datatype] property of the same information item. The representation is produced by following the procedure under the appropriate heading below. If the information item has no [datatype] property the procedure under the heading "Other" is to be used.
The representation is the normalized string value of the locator in the [value] property.
The representation is the canonical lexical representation corresponding to the lexical representation in the [value] property, as defined by [XMLSCHEMA-2].
The representation is the canonical lexical representation corresponding to the lexical representation in the [value] property, as defined by [XMLSCHEMA-2].
The representation is the canonical lexical representation corresponding to the lexical representation in the [value] property, as defined by [XMLSCHEMA-2].
The representation is the canonical lexical representation corresponding to the lexical representation in the [value] property, as defined by [XMLSCHEMA-2].
The representation is the string in the [value] property.
Locator values are represented by an element information item with the following properties:
[[local name]] The string "locator"
[[children]] A sequence of character information items representing the normalized string value of the locator
[[attributes]] The empty set
Locator values are normalized into strings using the process described below. This description uses the terms "fragment identifier", "query" and "path segment" as defined in [RFC 3986].
Let the value P be the string value of the base locator with any fragment identifier and query removed and any trailing "/" character removed.
If the string value of the locator starts with P, then the representation of the locator is the substring starting from, and including, the character immediately following the string that matches P, with any leading "/" character removed.
If the string value of the locator does not start with P and P can be interpreted as an IRI with at least one path segment, then remove the last path segment from P and any trailing "/" character and repeat from step (2).
If the string value of the locator is not modified by the steps above, then the string value of the locator is the representation of the locator.
This process may result in a string value which is no longer a syntactically valid or resolvable IRI. This is by design, as this part of ISO/IEC13250 does not require a conforming implementation to dereference these addresses.
The resulting string must be normalized according to Unicode Normalization Form C.
The number attribute of an information item is an attribute information item with the following properties:
[[local name]] The string "number"
[[normalized value]] The information item in the [parent] attribute of this information item will have a set property containing as one of its element this information item. The value is the string encoding of the position of this information item in the canonically ordered list of the values from that set
Before encoding a string property as a sequence of character information items, the string must be normalised according to Unicode Normalization Form C (Unicode Standard Annex #15, Unicode Normalization Forms, [Unicode]). Each character information item must have the following properties:
[[character code]] The ISO 10646 character code for the character.
[[element content whitespace]] False.
[[parent]] The containing element or attribute information item.
When the position of an item in a list is to be encoded, the encoded value is the index of that item in the list counting from 1 as the index of the first list item.
All element information items created by the canonicalization process must have the following property values:
[[namespace name]] No value
[[prefix]] No value
[[namespace attributes]] The empty set
[[in-scope namespaces]] The empty set
[[base URI]] No value
[[parent]] The element information item or document information item of which the element is a direct child
All attribute information items created by the canonicalization process must have the following property values:
[[namespace name]] No value
[[prefix]] No value
[[attribute type]] Unknown
[[references]] Unknown
[[specified]] True
[[owner element]] The element information item that this attribute information item belongs to
When transforming an instance of the Topic Maps Data Model to an instance of the XML Infoset model, all properties in the Topic Maps Data Model which are sets of information items must be encoded in the XML Infoset model by encoding each set element in the canonical sort order for the set. The clauses 4.2 to 4.11 define the canonical sort order for each information item type.
The following sort order applies to all information items and all values of the types defined by the Topic Maps Data Model.
Null
string
set
locator
topic map
topic
topic name
variant
occurrence
association
association role
String values are compared on a character by character basis from the start of the string to the end. The comparison is performed on strings normalized to Unicode Normalization Form C. When the first pair of characters with different character codes are found, then the string containing the character with the lower code sorts lower than the string containing the character with the higher code. If all pairs compare equal, but one string is shorter than the other, the shortest string sorts lower than the longest string. If no differences are found the two strings are considered equal.
Sets sort in order of the number of elements in the collection. A set with fewer elements sorts lower than a set with more elements.
For sets of equal size, first sort the elements of each set into their canonical ordering. Starting with the lowest element in each sorted set, perform a pair-wise comparison of element in each collection until a non-equal comparison is found. The collections then sort in the order of the two non-equal elements.
Sets with exactly the same elements will be considered equal.
Locators are first normalized, and the normalized locators are then compared in the same way as strings (see 4.3).
Topic items are compared by comparing their properties in the following order.
[subject identifiers]
[subject locators]
[item identifiers]
A combination of these three properties are all that is required to compare two topics. Part 2 of this standard requires that all topic items have at least one value for one of these properties and should two topics match in any one of these three properties, they must be merged.
Topic name items are compared by comparing their properties in the following order.
[value]
[type]
[scope]
[parent]
Variant items are compared by comparing their properties in the following order.
[value]
[datatype]
[scope]
[parent]
Occurrence items are compared by comparing their properties in the following order.
[value]
[datatype]
[type]
[scope]
[parent]
Association items are compared by comparing their properties in the following order.
[type]
[roles]
[scope]
[parent]
Association role items are compared by comparing their properties in the following order.
[player]
[type]
[parent]
topicMap = element topicMap { attribute reifier { xsd:integer }?, itemIdentifiers?, topic*, association* } attlist.reifier = attribute reifier { xsd:integer }?, attribute number { xsd:integer } topic = element topic { attribute number { xsd:integer }, subjectIdentifiers?, subjectLocators?, itemIdentifiers?, name*, occurrence*, rolePlayed* } subjectIdentifiers = element subjectIdentifiers { locator+ } subjectLocators = element subjectLocators { locator+ } itemIdentifiers = element itemIdentifiers { locator+ } name = element name { attlist.reifier, value, type, scope?, variant*, itemIdentifiers? } variant = element variant { attlist.reifier, value, datatype, scope, itemIdentifiers? } occurrence = element occurrence { attlist.reifier, value, datatype, type, scope?, itemIdentifiers? } rolePlayed = element rolePlayed { attribute ref { role.ref } } role.ref = xsd:token { pattern = "association.[1-9][0-9]*.role.[1-9][0-9]*" } association = element association { attlist.reifier, type?, role*, scope?, itemIdentifiers? } role = element role { attlist.reifier, player?, type?, itemIdentifiers? } attlist.topicref = attribute topicref {xsd:integer} player = element player { attlist.topicref } type = element type { attlist.topicref } value = element value { text } locator = element locator { text } scope = element scope { scopingTopic+ } datatype = element datatype { xsd:anyURI } scopingTopic = element scopingTopic { attlist.topicref } start = topicMap |
ISO/IEC 13250-3, ISO/IEC 13250-3: Information technology — Topic Maps — XML Syntax, ISO