ISO/IEC JTC 1/SC34 N0448

ISO/IEC

ISO/IEC JTC 1/SC34

Information Technology —

Document Description and Processing Languages

Title: TMQL requirements
Source:Lars Marius Garshol, Robert Barta, JTC1/SC34
Project:
Project editor:
Status:Draft
Action:For review and comment
Date:2003-11-07
Summary:
Distribution:SC34 and Liaisons
Refer to:ISO/IEC JTC 1/SC34 N0249, 2001-08-09
Supercedes:ISO/IEC JTC 1/SC34 N0249, 2001-08-09
Reply to:Dr. James David Mason
(ISO/IEC JTC1/SC34 Chairman)
Y-12 National Security Complex
Information Technology Services
Bldg. 9113 M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
E-mail: mailto:mxm@y12.doe.gov
http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm

Mrs. Sara Desautels, ISO/IEC JTC 1/SC 34 Secretariat
American National Standards Institute
25 West 43rd Street
New York, NY 10036
Tel: +1 212 642-4937
Fax: +1 212 840-2298
E-mail: sdesaute@ansi.org

TMQL requirements 1.2.0

Draft 2003-11-07

This version:
ISO/IEC JTC 1/SC34 N0448
Previous versions:
ISO/IEC JTC 1/SC34 N0249, 2001-08-09
ISO/IEC JTC 1/SC34 N0227, 2001-06-12
Authors:
Lars Marius Garshol <larsga@ontopia.net>
Robert Barta <rho@bond.edu.au>

Table of Contents

1 Introduction
2 Query Environment
3 Requirements on the Language Definition Document
    3.1 Relationship to other standards
    3.2 Structure
    3.3 Syntax
    3.4 Semantics
4 Requirements for the Language
    4.1 Functionality
    4.2 Input
    4.3 Result sets
    4.4 Usability
    4.5 Implementation
5 Non-requirements

Appendices

A References (Non-Normative)
B Acknowledgements (Non-Normative)
C Revision history (Non-Normative)


1 Introduction

This document provides informal requirements and feature lists for the upcoming ISO standard TMQL (Topic Map Query Language, ISO/IEC 18048). It reflects the intentions of the Topic Map community regarding a Topic Map retrieval and manipulation language and contains the consolidated view of the standards editors. Herein are defined the requirements for the TMQL standard as a whole, and for the query aspect of TMQL in particular. Additional requirements for the update part of TMQL will be detailed at a later stage.

The purpose of this document is to be as explicit as possible about the form and functionality of a Topic Map query language without preempting the discussion process on how particular objectives, goals, and requirements can be achieved. This is regarded as the most effective way to solicit more specific comments from the community.

While striving for a maximum of expliciteness, some of the requirements are only implicitly specified through other requirements. In this light the document should not be read as a fully formalized requirements document. Instead, feedback on this document is requested, especially as the targeted lifetime of TMQL related technologies is projected to be at least 15 to 20 years.

This document is organized as follows. After some editorial definitions we will define some basic concepts for querying which we will use throughout the rest of the document. These set the stage for requirements of the standardization document and on requirements regarding the language itself.

The keywords "MUST," "MUST NOT," "REQUIRED," "SHALL," "SHALL NOT," "SHOULD," "SHOULD NOT," "RECOMMEND," "MAY," and "OPTIONAL will be used in this document as defined in [RFC2119].

2 Query Environment

The following concepts relating to the self-containedness of queries have been identified. They are described here in order to clarify the list of requirements following the concepts. Please note that these concepts — and how they may apply to TMQL as it will be defined — are not yet fully understood. As such the descriptions below themselves do not constitute requirements.

TMQL query statement

A query statement is a textual representation in a formal language TMQL to query a Topic Map repository.

TMQL query

A query is the actual evaluation of a query statement.

TMQL environment

This is the context in which TMQL queries are evaluated. The context may contain things like ID-to-topic-map mappings and identifier to variable, function, or predicate mappings. It may also contain base URIs used to resolve relative URIs within a query statement, etc.

Inter-query context

This is the execution environment for TMQL queries, possibly as modified by previous queries. It is not clear what this context may contain as it may only make sense for particular languages.

Ed. Note:
This may need some more thought.

Intra-query context

This is the execution environment local to a TMQL query and invisible to later queries, as modified or set up by the query itself. It may contain identifier to value mappings, a base URI for resolving relative URIs, specifications of nested queries, specifications of local functions/predicates, and so on.

Result sets

A result set is the outcome of executing a query; it may be a structured collection of topic map data model items and primitive values or simply a list of these; it may be XML data; or it may be an instance of the Topic Map Data Model [DM]. It does not have any specific syntax.

Primitive results

Results in the form of a list, set, or bag of values of the fundamental types, topic map data model items, and tuples of the above.

XML results

Results in the form of abstract XML documents.

Topic map results

Results in the form of an instance of the Topic Map Data Model [DM].

3 Requirements on the Language Definition Document

This section lists requirements on the standardization document rather than those on the technology itself. These include requisites that the document has to follow a particular structure and should contain specific sections. In addition to that it is acknowledged that the standard document itself will not be isolated but will have to harmonize with a number of Topic Map related and other existing standards. These will be listed in the next section.

3.1 Relationship to other standards

  1. The syntax of any URIs within TMQL query statements shall be governed by the rules of RFC 2396 [RFC2396] as modified by RFC 2732 [RFC2732].

  2. The native character set of TMQL shall be Unicode [Unicode].

  3. Any ordering of strings shall be based on externally-defined specifications for internationalized string collation. Candidates are the Unicode Collation Algorithm [UTR10], and ISO 14651 [ISO14651].

  4. The TMQL standard should be defined based on a set of use cases representing general classes of queries expected to be common [UC].

  5. TMQL shall be based on [ISO13250].

  6. TMQL shall be based on the topic map data model [DM]. (Thus, TMQL will also support XTM 1.0 [XTM] and XTM 1.1 [XTM11], respectively)

  7. TMQL shall be harmonized with the Topic Map Constraint Language [TMCL] in the sense that there shall be no overlap of functionality, The standards shall be compatible and features from TMCL shall be reused in TMQL wherever suitable.

3.2 Structure

  1. The TMQL standard SHALL clearly define how it uses each of the concepts of the query environment (see 2 Query Environment), and whether it uses any of them.

  2. The TMQL standard MUST have a part which details the query language for Topic Maps. This section MUST define a formal syntax and the semantics of query statements.

  3. The TMQL standard SHALL contain a conformance clause, stating the conditions under which TMQL implementations may claim to conform to the standard.

  4. The TMQL standard MAY have a part which covers updating of Topic Maps and MAY also contain a section about administration of Topic Map repositories. This MAY include suggestions for language means for query optimization and storage modalities.

  5. The TMQL standard MAY have a part which covers output from TMQL result sets.

3.3 Syntax

TMQL query statements will have to have a syntax when written as serialized text. This will be used in the authoring process and also when machines exchange statements.

  1. The syntax of TMQL statements SHALL be defined in terms of a formal, context-free grammar. It is suggested to use EBNF as notation.

  2. An XML syntax for TMQL query statement MAY be defined. This is mainly thought for interchange, not as primary notation for humans. It is suggested to use [RELAX-NG] as the schema language for this syntax.

  3. The formal definition of the grammar MUST constrain query statements to be written in [Unicode].

3.4 Semantics

When defining the execution semantics of TMQL statements the definition has to include the underlying data, the query statement (serialized or deserialized) and a result, either in form of a serialized or deserialized data structure.

It is also expected that for semantic characterisation an execution context is used to capture intermediate steps in the query process.

  1. The TMQL standard shall be rigorous and should fully define the results of queries so that any given query can only have one correct result (disregarding sorting orders where no sorting is mandated). It should also be — as far as the rigor allows — be easily understandable to the average engineer.

  2. The TMQL standard shall define error situations and how TMQL processors are required to react to them or to flag them to the calling application.

  3. The execution context MAY be defined as seems appropriate for a language, there is no constraint on the form and nature of this context.

  4. The definition of the underlying data MUST be based (directly or indirectly) on the data model [DM].

  5. The definition of the result of a TMQL query MUST be done by using also [DM] for TM-structured data. For other kinds of output the standard MAY use any other appropriate means to formally define the structure. If the structure contains text, then [Unicode] MUST be enforced.

Ed. Note:
Mention pragmas here?

4 Requirements for the Language

The requirements for TMQL itself are grouped by core functionalities, by the behavior the language exposes relative to the environment (input, output), usability and implementation issues.

4.1 Functionality

  1. TMQL MUST be able to solve the use cases outlined in [UC]. This can be achieved either directly with the language or via an extension framework if such is defined.

  2. TMQL shall support all natural languages equally. That is, TMQL shall be fully internationalized with respect to text representation, text ordering, etc.

  3. A TMQL query SHALL contain all the information necessary to interpret it (with respect to its TMQL environment, inter-query context, and intra-query context).

4.2 Input

  1. TMQL MUST allow to specify queries shall be able to span over multiple topic maps. This is to say, that TMQL should allow querying of topic maps stored in a distributed fashion.

  2. Statements in TMQL itself SHOULD NOT contain any assumptions how Topic Maps are represented. They should only refer to the abstract model defined by the [DM].

  3. TMQL must allow the environment to pass in literals into a query as query parameters.

  4. The character set for exchange of text data between the environment and an TMQL processor SHALL BE [Unicode].

4.3 Result sets

  1. Result sets shall be instances of an abstract TMQL data model.

  2. TMQL SHALL enable the specification of primitive, XML, and topic map results.

  3. When serializing results the TMQL standard SHALL enable queries to specify the character encoding of the output.

  4. For result sets TMQL MUST support sorting of query results according to properties and values related to the result set. This relationship must be clearly defined.

  5. TMQL shall support limiting the number of values returned by a particular query. (This is essentially the same feature as that provided by SQL's LIMIT feature, but TMQL may choose to support it in a completely different way.)

  6. TMQL shall support paging the values returned by a particular query through functionality for slicing the list of results. (This is essentially the same functionality as that provided by SQL's LIMIT and OFFSET features, but TMQL may choose to support it in a completely different way.)

4.4 Usability

  1. TMQL SHALL have a concise and human-readable syntax.

  2. TMQL SHOULD scale in terms of complexity: Simple (and frequent) queries should be formalizable with short statements. Complex queries may take longer statements; for these TMQL SHALL make it possible to write queries in a modular fashion.

  3. TMQL SHALL BE extensible. TMQL shall define controlled mechanisms for third-party extensions, e.g., domain-specific extensions.

4.5 Implementation

  1. TMQL SHALL BE independent of any particular implementation technique or implementation language.

  2. TMQL SHALL BE efficiently implementable for the typical queries.

  3. It SHOULD BE technically feasible to automatically assess queries in terms of their computational costs.

  4. The TMQL query syntax SHALL BE designed to be easily embeddable into XML documents and programming language source code. (This means for example that characters like '&' and '<' cannot be commonly used in TMQL.)

  5. The syntax SHOULD BE designed so that queries expected to be common be as easy to write as possible.

5 Non-requirements

Requirements listed in this section are for various reasons not in the scope of TMQL.

  1. The TMQL standard shall not include an API to query processors in parts 1 (retrieval) or 2 (updates) of the standard.

  2. The TMQL standard shall not define mechanisms for specifying validity constraints on topic maps. It may be used by other specifications, such as [TMCL], and software to define such constraints.

  3. The TMQL standard shall not define a natural language query interface.

  4. TMQL part 1 shall not define administrative commands so. Administrative commands include functionality like index maintenance, creating/deleting topic map collections, optimization operations, configuration, etc.

  5. TMQL shall not define any functionality for resolving locators such as URIs.

A References (Non-Normative)

DM
ISO/IEC 13250-2: Topic Maps — Data Model, forthcoming.
ISO13250
ISO/IEC 13250:2002 Topic Maps, ISO, Geneva, 2002.
ISO14651
ISO/IEC 14651: International String Ordering — Method for comparing Character Strings and Description of the Common Template Tailorable Ordering, ISO, Geneva.
RELAX-NG
RELAX-NG ????
RFC2119
IETF RFC 2396: Key words for use in RFCs to Indicate Requirement Levels, S. Bradner. Network Working Group, Internet Engineering Task Force. March 1997.
RFC2396
IETF RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax., T. Berners-Lee, R. Fielding, L. Masinter. 1998.
RFC2732
IETF RFC 2732: Format for Literal IPv6 Addresses in URL's, R. Hinden, B. Carpenter, L. Masinter. December 1999.
TMCL
ISO/IEC 19756: Topic Map Constraint Language, currently in development.
UC
TMQL Use Cases, Eds: Lars Marius Garshol, Ontopia; Robert Barta, rho Information Systems. Forthcoming.
Unicode
The Unicode Standard, Version 3.0. The Unicode Consortium, Reading, Mass., USA: Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5.
UTR10
UTR #10: Unicode Collation Algorithm, Mark Davis and Kenneth Whistler, the Unicode Consortium, 2002-07-16.
XTM
XML Topic Maps (XTM) 1.0 Specification, TopicMaps.Org, 2001.
XTM11
XML Topic Maps (XTM) 1.1 Specification (CD), Eds: Lars Marius Garshol, Graham Moore, Ontopia, 2003.

B Acknowledgements (Non-Normative)

This document is based on input from

C Revision history (Non-Normative)

Version 1.2.0 is a major rewrite after it has been realized that the requirements can be classified as being related to the standardization process itself or being concerned about language features. Additions: some definitions at the beginning, RFC2119, some more references.

Version 1.1.0 has kept many requirements from 1.0.0, but also added a large number of requirements, changed many requirements, and lost a large number of requirements. Section 3 of the document was also internally reorganized. In short, the changes are too many to list.