CARVIEW |
XQuery and XPath Full-Text Requirements
W3C Working Draft 02 May 2003
- This version:
- https://www.w3.org/TR/2003/WD-xquery-full-text-requirements-20030502/
- Latest version:
- https://www.w3.org/TR/xquery-full-text-requirements/
- Previous version:
- https://www.w3.org/TR/2003/WD-xmlquery-full-text-requirements-20030214/
- Editors:
- Stephen Buxton, Oracle Corp <stephen.buxton@oracle.com>
- Michael Rys, Microsoft <mrys@microsoft.com>
Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
Abstract
The document specifies requirements for Full-Text search for use in XQuery [XQuery] and XPath [XPath].
Status of this Document
This is a public W3C Working Draft for review by W3C Members and other interested parties. This section describes the status of this document at the time of its publication. It is a draft document and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current public W3C technical reports can be found at https://www.w3.org/TR/.
The Full-Text Requirements have been defined jointly by the XQuery Working Group and the XSL Working Group (both part of the XML Activity).
This document is a work in progress. It contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.
Public comments on this document and its open issues are welcome. Comments should be sent to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at https://lists.w3.org/Archives/Public/public-qt-comments/).
Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at https://www.w3.org/2002/08/xmlquery-IPR-statements and on the XSL Working Group's patent disclosure page at https://www.w3.org/Style/XSL/Disclosures.
A list of current W3C Recommendations and other technical documents can be found at https://www.w3.org/TR/.
Table of Contents
1 Introduction
2 Terminology
2.1 MUST
2.2 SHOULD
2.3 MAY
2.4 SCORE
2.5 Full-Text
Search
3 Language Design
3.1 The Data
Model
3.2 Side-effects on the data
3.3 Score Function and Full-Text
predicates
3.3.1 Predicate and Score
Independence
3.3.2 Score language
3.4 Score
algorithm
3.4.1 Return Score
3.4.2 Sort by Score
3.4.3 Type, Range of Score
3.4.4 Score Statistics
3.4.5 Semantics of Score
3.5 Combined
score
3.5.1 Score Combination
3.5.2 Score algorithm
vendor-provided
3.5.3 Score algorithm
overridable
3.5.4 Score influence
3.6 Extensibility
3.6.1 Extensible by vendors
3.6.2 Extensible by users
3.7 First,
Future Versions
3.8 End user language
3.9 Searchable query
3.10 Universality
4 Integration
4.1 XPath
4.2 Extensibility
Mechanisms
4.2.1 Integration into
XQuery/XPath
4.2.2 XQuery/XPath Full-Text
Extensibility
4.3 Composability
4.4 Human-readable
4.5 XML
syntax
5 Implementation
5.1 Declarativity
6 Functionality and Scope
6.1 Functionality
6.2 Search Scope
6.2.1 Search within arbitrary
structure
6.2.2 Constructed Structures
6.2.3 Return Arbitrary Nodes
6.2.4 Parts of Search Tree
6.3 Attributes
6.3.1 Search within attributes
6.3.2 Search across attributes and
content
6.4 Markup
6.5 Element Boundaries
6.5.1 Search across element
boundaries
6.5.2 Element as a token
boundary
6.6 Score
6.6.1 Score accessible
6.6.2 Implicit ordering
6.6.3 Score extendable
Appendices
A References
A.1 Non-Normative
B Change Log
1 Introduction
"Full-Text Search" (FTS) is a large field which covers a vast array of functionality. In addition, there are many different ways one could combine FTS capabilities with XQuery and XPath.
This paper describes a set of requirements for FTS in XQuery/XPath (XQuery/XPath Full-Text). At this stage in the life of the document, these requirements should be read as suggestions only: the issues associated with the requirements are to be discussed and resolved by the relevant Working Groups. This format provides a firm basis for the Working Groups to set the direction of the work on XQuery/XPath Full-Text, and to compare existing proposals. Once the issues are resolved and this Requirements document is finalized, it will be easier to define the functionality of XQuery/XPath Full-Text and it's integration with XQuery and/or XPath.
Note that we will attempt to define requirements for the language without reference to any particular solution.
2 Terminology
We use the terms MUST, SHOULD and MAY throughout the document to specify the extent to which an item is a requirement for the work of XQuery/XPath Full-Text. We use the same definitions of MUST, SHOULD and MAY as The XML Query Requirements [XML Query Requirements]
2.1 MUST
[Definition: MUST means that the item is an absolute requirement.]
2.2 SHOULD
[Definition: SHOULD means that there may exist valid reasons not to treat this item as a requirement, but the full implications should be understood and the case carefully weighed before discarding this item.]
2.3 MAY
[Definition: MAY means that an item deserves attention, but further study is needed to determine whether the item should be treated as a requirement.]
When the words MUST, SHOULD, or MAY are used in this technical sense, they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.
Other terminology used in this document:
2.4 SCORE
[Definition: SCORE reflects relevance of matched material.]
2.5 Full-Text Search
[Definition: Full-Text Search in this document is an extension to the XQuery/XPath language. It provides a way to query text which has been tokenized, i.e. broken into a sequence of words, units of punctuation, and spaces. Tokenization enables functions and operators whch work with the relative positioning of words (e.g., proximity operators). Tokenization also enables functions and operators which operate on a part or the root of the word (e.g., wildcards, stemming).]
3 Language Design
This section covers requirements for XQuery/XPath Full-Text language design that are independent from, but related to, integration and scoping requirements.
3.1 The Data Model
XQuery/XPath Full-Text functions MUST operate on instances of the XQuery/XPath Data Model.
3.2 Side-effects on the data
XQuery/XPath Full-Text MUST NOT introduce or rely on side-effects.
3.3 Score Function and Full-Text predicates
3.4 Score algorithm
3.4.3 Type, Range of Score
XQuery/XPath Full-Text MUST define the type and range of SCORE values. The SCORE SHOULD be a float, in the range 0-1.
3.5 Combined score
3.5.1 Score Combination
XQuery/XPath Full-Text MUST be able to generate a SCORE for a combination of Full-Text predicates.
3.5.2 Score algorithm vendor-provided
The algorithm to produce combined SCOREs MUST be vendor-provided.
3.5.3 Score algorithm overridable
The algorithm to produce combined SCOREs SHOULD be overridable by users.
3.5.4 Score influence
Users MUST be able to influence individual components of complex score expressions.
3.6 Extensibility
3.6.1 Extensible by vendors
XQuery/XPath Full-Text MUST be extensible by vendors.
3.6.2 Extensible by users
XQuery/XPath Full-Text MAY be extensible by users.
3.7 First, Future Versions
The first version of XQuery/XPath Full-Text MUST provide a robust framework for future versions.
3.8 End user language
It is not a requirement that XQuery/XPath Full Text be designed as an end-user UI language.
3.9 Searchable query
It SHOULD be possible to search XQuery/XPath Full-Text queries.
4 Integration
This section specifies requirements for the integration of XQuery/XPath Full-Text with XQuery and XPath.
4.1 XPath
Part, but not necessarily all, of XQuery/XPath Full-Text MUST be usable as part of an XPath expression..
4.2 Extensibility Mechanisms
4.2.1 Integration into XQuery/XPath
XQuery/XPath Full-Text SHOULD use the extensibility mechanisms that exist in XQuery and XPath for integration into XQuery and XPath.
4.2.2 XQuery/XPath Full-Text Extensibility
XQuery/XPath Full-Text MUST use the extensibility mechanisms that exist in XQuery and XPath for it's own extensibility.
4.3 Composability
XQuery/XPath Full-Text MUST be composable with XQuery, and SHOULD be composable with itself.
4.4 Human-readable
XQuery/XPath Full-Text may have more than one syntax binding. One query language syntax must be convenient for humans to read and write. See XML Query Requirements
4.5 XML syntax
XQuery/XPath Full-Text MAY have more than one syntax binding. One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query. See XML Query Requirements
6 Functionality and Scope
This section defines requirements for the functionality in XQuery/XPath Full-Text, and the scope of XQuery/XPath Full-Text queries.
6.1 Functionality
XQuery/XPath Full-Text MUST provide, in the first release, the minimum set of Full-Text functionality that is useful.
-
single-word search
-
phrase search
-
support for stopwords
-
single character suffix
-
0 or more character suffix
-
0 or more character prefix
-
0 or more character infix
-
proximity searching (unit: words)
-
specification of order in proximity searching
-
combination using AND
-
combination using OR
-
combination using NOT
-
word normalization, diacritics
-
ranking, relevance
Additional functionality represented in the [XQuery and XPath Full-Text Use Cases] MUST be considered, but may be left to a future release.
Additional functionality from other Full-Text search contexts such as [SQL/MM Full-Text] MUST be considered, but SHOULD be left to a future release.
6.2 Search Scope
6.2.1 Search within arbitrary structure
XQuery/XPath Full-Text MUST allow search within an arbitrary structure (an arbitrary XPath expression).
6.2.2 Constructed Structures
XQuery/XPath Full-Text MUST NOT preclude Full-Text search within structures constructed during a query.
6.2.3 Return Arbitrary Nodes
XQuery/XPath Full-Text MUST allow a query to return arbitrary nodes.
6.2.4 Parts of Search Tree
XQuery/XPath Full-Text MUST allow the combination of predicates on different parts of the searched document 'tree'.
6.3 Attributes
6.3.1 Search within attributes
XQuery/XPath Full-Text MUST support Full-Text search within attributes.
6.3.2 Search across attributes and content
XQuery/XPath Full-Text MAY support Full-Text search within attributes in conjunction with Full-Text search within element content.
6.4 Markup
If XQuery/XPath Full-Text supports search within names of elements and attributes, then it MUST distinguish between
-
element content and attribute values
and
-
names of elements and attributes
in any search.
6.5 Element Boundaries
6.5.1 Search across element boundaries
XQuery/XPath Full-Text MUST support search across element boundaries, at least for NEAR.
A References
A.1 Non-Normative
- XQuery
- XQuery 1.0: An XML Query Language. W3C Working Draft. (See https://www.w3.org/TR/xquery/.)
- XPath
- XML Path Language (XPath) 2.0. W3C Working Draft. (See https://www.w3.org/TR/xpath20/.)
- XML Query Requirements
- XML Query Requirements (See https://www.w3.org/TR/xquery-requirements.)
- XQuery and XPath Full-Text Use Cases
- XQuery 1.0 and XPath 2.0 Full-Text Use Cases (See https://www.w3.org/TR/xmlquery-full-text-use-cases/.)
- SQL/MM Full-Text
- ISO/IEC 13249-2:2000, Information technology - Database languages - SQL Multimedia and Application Packages - Part 2: Full-Text, International Organization For Standardization, 2000, referenced in e.g. "SQL Multimedia and Application Packages (SQL/MM)" (See https://www.acm.org/sigmod/record/issues/0112/standards.pdf)
B Change Log
Author | Date | Action | Description |
Stephen Buxton | 2003-03-19 | Added a Change Log | |
Stephen Buxton | 2003-03-19 | Terminology definition changes | Switched the definitions of SHOULD and MAY, to be consistent with [XML Query Requirements]. The rest of the document does not need to change, since the earlier versions of this document, on which the text of the spec is based, referred to the definitions in [XML Query Requirements]. |
Stephen Buxton | 2003-04-18 | Change XML Query Requirements link to external URI | Changed links in the document body to point to external latest copy of XML Query Requirements. |