没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
XQuery:
An XML query
language
by D. Chamberlin
The World Wide Web Consortium has
convened a working group to design a query
language for Extensible Markup Language
(XML) data sources. This new query language,
called XQuery, is still evolving and has been
described in a series of drafts published by
the working group. XQuery is a functional
language comprised of several kinds of
expressions that can be nested and
composed with full generality. It is based on
the type system of XML Schema and is
designed to be compatible with other XML-
related standards. This paper explains the
need for an XML query language, provides a
tutorial overview of XQuery, and includes
several examples of its use.
Increasingly, Extensible Markup Language (XML)
1
is considered the format of choice for the exchange
of information among various applications on the
Internet. The popularity of XML is due in large part
to its flexibility for representing many kinds of in-
formation. The use of tags makes XML data self-de-
scribing, and the extensible nature of XML makes it
possible to define new kinds of documents for spe-
cialized purposes. As the importance of XML has in-
creased, a series of standards has grown up around
it, many of which were defined by the World Wide
Web Consortium (W3C).
2
For example, XML Sche-
ma
3
provides a notation for defining new types of
elements and documents; XML Path Language
(XPath)
4
provides a notation for selecting elements
within an XML document; and Extensible Stylesheet
Language Transformations (XSLT)
5
provides a no-
tation for transforming XML documents from one
representation to another.
XML makes it possible for applications to exchange
data in a standard format that is independent of stor-
age. For example, one application may use a native
XML storage format, whereas another may store data
in a relational database. Since XML is emerging as
a standard for data exchange, it is natural that que-
ries among applications should be expressed as que-
ries against data in XML format. This use gives rise
to a requirement for a query language designed ex-
pressly for XML data sources. In October 1999, W3C
convened the XML Query Working Group
6
for the
purpose of designing such a query language, to be
called XQuery.
XML data are different from relational data in sev-
eral important respects that influence the design of
a query language. Relational data tend to have a reg-
ular structure, which allows the descriptive meta-data
for these data to be stored in a separate catalog. XML
data, in contrast, are often quite heterogeneous, and
distribute their meta-data throughout the document.
XML documents often contain many levels of nested
elements, whereas relational data are “flat.” XML
documents have an intrinsic order, whereas relational
data are unordered except where an ordering can
be derived from data values. Relational data are usu-
ally “dense” (nearly every column has a value), and
娀Copyright 2002 by International Business Machines Corpora-
tion. Copying in printed form for private use is permitted with-
out payment of royalty provided that (1) each reproduction is done
without alteration and (2) the Journal reference and IBM copy-
right notice are included on the first page. The title and abstract,
but no other portions, of this paper may be copied or distributed
royalty free without further permission by computer-based and
other information-service systems. Permission to republish any
other portion of this paper must be obtained from the Editor.
IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 0018-8670/02/$5.00 © 2002 IBM CHAMBERLIN
597
relational systems often represent missing informa-
tion by a special null value.
XML data, in contrast,
are often “sparse” and can represent missing infor-
mation simply by the absence of an element. For
these and other reasons, existing relational query lan-
guages are not directly suitable for querying XML
data.
The design of XQuery is still in progress. The XML
Query Working Group has published working drafts
of several documents that describe the current state
of the design. Of these, perhaps the most important
is XQuery 1.0: An XML Query Language,
7
which con-
tains a syntax and informal description of the lan-
guage. The working group has also published a list
of requirements,
8
a description of the data model
that underlies the language,
9
a formal semantic de-
scription,
10
a list of functions and operators,
11
and
a collection of use cases that illustrate applications
of the language.
12
Each of these documents is up-
dated from time to time as the design of XQuery
evolves. This paper is based on the most recent
XQuery design at the time of its publication, but since
this design is still changing, the documents refer-
enced in this paragraph should be consulted for the
latest developments.
The design of XQuery has been subject to a number
of influences. Perhaps the most important of these
is compatibility with existing W3C standards, includ-
ing Schema, XSLT, XPath, and XML itself. XPath, in
particular, is so important and so closely related that
XQuery is defined as a superset of XPath. The over-
all design of XQuery is based on a language proposal
called Quilt.
13
Quilt, in turn, was influenced by the
functional approach of Object Query Language
(OQL),
14
by the keyword-based syntax of Structured
Query Language (SQL),
15
and by previous XML query
language proposals including XQL,
16
XML-QL,
17
and
Lorel.
18
It is an objective of the XML Query Working Group
to define two syntaxes for XQuery: one that is ex-
pressed in XML, and one that is optimized for hu-
man writing and understanding. This paper describes
only the human-oriented version of XQuery.
The initial design of XQuery is focused only on in-
formation retrieval and does not provide facilities
for updating existing XML documents. The XML
Query Working Group may consider the addition
of an update facility after completing the design of
the first version of XQuery.
This paper describes the data model on which
XQuery is based, and then presents an overview of
the XQuery language in the form of a series of ex-
amples. This paper is not intended to provide a rig-
orous or exhaustive definition of the language. The
reader is referred to Reference 7 for an XQuery syn-
tax and a more complete language description.
Data model
Formally, the input and output of XQuery are de-
fined in terms of a data model, described in Refer-
ence 9. The query data model provides an abstract
representation of one or more XML documents or
document fragments. The data model is based on
the notion of a sequence. A sequence is an ordered
collection of zero or more items. An item may be a
node or an atomic value. An atomic value is an in-
stance of one of the built-in data types defined by XML
Schema, such as strings, integers, decimals, and dates.
A node conforms to one of seven node kinds, which
include element, attribute, text, document, comment,
processing instruction, and namespace nodes. A
node may have other nodes as children, thus form-
ing one or more node hierarchies. Some kinds of
nodes, such as element and attribute nodes, have
names or typed values, or both. A typed value is a
sequence of zero or more atomic values. Nodes have
identity (that is, two nodes may be distinguishable
even though their names and values are the same),
but atomic values do not have identity. Among all
the nodes in a hierarchy there is a total ordering
called document order, in which each node appears
before its children. Document order corresponds to
the order in which the nodes would appear if the
node hierarchy were represented in XML format.
Document order between nodes in different hierar-
chies is implementation-defined but must be consis-
tent; that is, all the nodes in one hierarchy must be
ordered either before or after all the nodes in an-
other hierarchy.
Sequences may be heterogeneous; that is, they may
contain mixtures of various types of nodes and atomic
The design of
XQuery has been
subject to a number
of influences.
CHAMBERLIN IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002
598
values. However, a sequence never appears as an
item in another sequence. All operations that cre-
ate sequences are defined to “flatten” their operands
so that the result of the operation is a single-level
sequence. There is no distinction between an item
and a sequence of length one—in other words, a
node or atomic value is considered to be identical
to a sequence of length one containing that node or
atomic value.
Sequences of length zero are valid and are some-
times used to represent missing or unknown infor-
mation, in much the same way that null values are
used in relational systems.
In addition to sequences, the query data model de-
fines a special value called the error value, which is
the result of evaluating an expression that contains
an error. An error value may not be combined in a
sequence with any other value.
Input XML documents can be transformed into the
query data model by a process called schema vali-
dation, which parses the document, validates it
against a particular schema, and represents it as a
hierarchy of nodes and atomic values, labeled with
type information derived from the schema. If an in-
put document does not have a schema, it is validated
against a permissive default schema that assigns ge-
neric types—nodes are labeled anyType and atomic
values are labeled anySimpleType. The process of
schema validation is described in more detail in Ref-
erence 3.
The result of a query may be transformed from the
query data model into an XML representation by a
process called serialization. The details of serializa-
tion are beyond the scope of this paper. It is worth
noting that the result of a query is not always a well-
formed XML document. For example, a query might
return an atomic value such as the number 47, or a
sequence of elements with no common parent.
Example data
To illustrate the query data model and provide a ba-
sis for later examples, we consider a small XML da-
tabase that contains data from an on-line auction,
itemno seller description reserve-price end-date
(ALL ITEM ELEMENTS
HAVE SIMILAR STRUCTURE)
Figure 1 Data model representation of items.xml
items.xml
D
items
E
item
status
A
E
item
status
A
E
E
TTT T T
E EE E
IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002 CHAMBERLIN
599
based loosely on Use Case R in Reference 12. The
database consists of two XML documents named
items.xml and bids.xml.
The items.xml document contains a root element
named items, which in turn contains an item element
for each item currently for sale at the auction. Each
item element has a status attribute and subelements
named itemno, seller, description, reserve-price,
and end-date. The reserve-price element names a
minimum selling price set by the owner, and the
end-date element indicates the ending date of the
auction.
The bids.xml document contains a root element
named bids, which in turn contains a bid element
for each bid that has been placed for an item. Each
bid element has subelements named itemno, bidder,
bid-amount, and bid-date.
Figures 1 and 2 show the data model representations
of the items.xml and bids.xml documents, respec-
tively (including only a representative item and a rep-
resentative bid). In the figures, the circles labeled
D, E, A, and T represent document, element, at-
tribute, and text nodes, respectively.
Expressions
We now describe expressions in XQuery.
Basics. Like XML and XPath, XQuery is a case-sen-
sitive language, and all its keywords are made up of
lowercase characters. Detailed rules for lexing and
parsing XQuery are described in Reference 7. Char-
acters enclosed between “{--” and “--}” are con-
sidered to be comments and are ignored during query
processing (except, of course, inside a quoted string,
where they are considered to be part of the string).
XQuery is a functional language, which means that
it is made up of expressions that return values and
do not have side effects. XQuery has several kinds
of expressions, most of which are composed from
lower-level expressions, combined by operators or
keywords. XQuery expressions are fully composable,
that is, where an expression is expected, any kind of
expression may be used. As noted earlier, the value
itemno bidder bid-amount bid-date
(ALL BID ELEMENTS
HAVE SIMILAR STRUCTURE)
Figure 2 Data model representation of bids.xml
bids.xml
D
bids
E
bid
E
bid
E
E
TTT T
EE E
CHAMBERLIN IBM SYSTEMS JOURNAL, VOL 41, NO 4, 2002
600
剩余18页未读,继续阅读
tonyzhow
- 粉丝: 10
- 资源: 12
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1