Python/XML HOWTO
_________________________________________________________________
A.M. Kuchling
akuchlin@mems-exchange.org
Abstract:
XML is the eXtensible Markup Language, a subset of SGML intended to
allow the creation and processing of application-specific markup
languages. Python makes an excellent language for processing XML data.
This document is a tutorial for the Python/XML package. It assumes
you're already somewhat familiar with the structure and terminology of
XML, though a brief introduction is supplied.
Contents
* 1 Introduction to XML
+ 1.1 Elements, Attributes and Entities
+ 1.2 Well-Formed XML
+ 1.3 DTDs
* 2 XML-Related Standards
* 3 Installing the XML Toolkit
* 4 Package Overview
* 5 SAX: The Simple API for XML
+ 5.1 Starting Out
+ 5.2 Error Handling
+ 5.3 Searching Element Content
+ 5.4 Enabling Namespace Processing
* 6 DOM: The Document Object Model
+ 6.1 Getting A DOM Tree
+ 6.2 Printing The Tree
+ 6.3 Manipulating the Tree
+ 6.4 Creating New Nodes
+ 6.5 Walking Over The Entire Tree
* 7 XPath and XPointer
* 8 Marshalling Into XML
* 9 Acknowledgements
* About this document ...
1 Introduction to XML
XML, the eXtensible Markup Language, is a simplified dialect of SGML,
the Standardized General Markup Language. XML is intended to be
reasonably simple to implement and use, and is already being used for
specifying markup languages for various new standards: MathML for
expressing mathematical equations, Synchronized Multimedia Integration
Language for multimedia presentations, and so forth.
SGML and XML represent a document by tagging the document's various
components with their function or meaning. For example, a book
contains several parts: it has a title, one or more authors, the text
of the book, perhaps a preface or an index, and so forth. A markup
languge for writing books would therefore have elements indicating
what the contents of the preface are, what the title is, and so forth.
This logical structure should not be confused with the physical
details of how the document is actually printed on paper. The index
might be printed with narrow margins in a smaller font than the rest
of the book, but markup usually isn't (or shouldn't be, anyway)
concerned with details such as this. Instead, other software will
translate from the markup language to a typeset format, handling the
presentation details.
This section will provide a brief overview of XML and a few related
standards, but it's far from being complete because making it complete
would require a full-length book and not a short HOWTO. There's no
better way to get a completely accurate (if rather dry) description
than to read the original W3C Recommendations; you can find links to
them below. If you already know what XML is, you can skip the rest of
this section.
Later sections of this HOWTO assume that you're familiar with XML
terminology. Most sections will use XML terms such as element and
attribute. Section does not require that you have experience with any
of the various Java SAX implentations.
See Also:
Extensible Markup Language (XML) 1.0 (Second Edition)
For the full details of XML's syntax, the definitive source is
the XML 1.0 specification. However, like all specifications
it's quite formal and isn't intended to be a friendly
introduction or a tutorial. An annotated version of the
standard, is also available, and there are many more informal
tutorials and books available to introduce you to XML at
greater (or lesser) length.
The Annotated XML Specification
This annotated version of the XML specification, produced by
Tim Bray, is quite helpful in clarifying the specification's
intent. It is presented as a richly-hyperlinked document that
makes navigation easy, and evokes a sense of what hypertext was
meant to be.
The XML Cover Pages
An extensive collection of links to XML and SGML resources,
including a news page that's updated every few days. If you can
only remember one XML-related URL, remember this one. Cafe con
Leche is another good resource.
xml-dev mailing list
This is a high-traffic list for implementation and development
of XML standards. Be warned: Some people might find the
discussion too focused on vague theorizing about information
representation, and not on inventing new standards and tools or
applying existing standards.
1.1 Elements, Attributes and Entities
A markup language specified using XML looks a lot like HTML; a
document consists of a single element, which contains sub-elements,
which can have further sub-elements inside them. Elements are
indicated by tags in the text. Tags are always inside angle brackets
< >. Elements can either contain content, or they can be empty.
An element can contain content between opening and closing tags, as in
<name>Euryale</name>, which is a name element containing the data
"Euryale". This content may be text data, other XML elements, or a
mixture of both.
Elements can also be empty, containing nothing, and are represented as
a single tag ended with a slash. For example, <stop/> is an empty stop
element. Unlike HTML, XML element names are case-sensitive; stop and
Stop are two different elements.
Opening and empty tags can also contain attributes, which specify
values associated with an element. For example, in the XML text <name
lang='greek'>Herakles</name>, the name element has a lang attribute
which has a value of "greek". In <name lang='latin'>Hercules</name>,
the attribute's value is "latin".
XML also includes entities as a shorthand for including a particular
character or a longer string. Entity references always begin with a
"&" and end with a ";". For example, a particular Unicode character
can be written as ሴ using its character code in decimal, or as
ሴ using hexadecimal. It's also possible to define your own
entities, making &title; expand to ``The Odyssey'', for example. If
you want to include the "&" character in XML content, it must be
written as &.
1.2 Well-Formed XML
A legal XML document must, as a minimum, be well-formed: each opening
tag must have a corresponding closing tag, and tags must nest
properly. For example, <b><i>text</b></i> is not well-formed because
the i element should be enclosed inside the b element, but instead the
closing </b> tag is encountered first. This example can be made
well-formed by swapping the order of the closing tags, resulting in
<b><i>text</i></b>.
If you've ever written HTML by hand, you may have acquired the habit
of being a bit sloppy about this. Strictly speaking HTML has exactly
the same rules about nesting tags as XML, but most Web browsers are
very forgiving of errors in HTML. This is convenient for HTML authors,
but it makes it difficult to write programs to parse HTML input
because the programs have to cope with all sorts of malformed input.
The authors of the XML specification didn't want XML to fall into the
same trap, because it would make XML processing software much harder
to write. Therefore, all XML parsers have to be
没有合适的资源?快使用搜索试试~ 我知道了~
python解析xml
4星 · 超过85%的资源 需积分: 10 7 下载量 9 浏览量
2011-06-06
12:44:41
上传
评论
收藏 718KB GZ 举报
温馨提示
共565个文件
py:429个
html:16个
h:14个
python解析xml的类库,适用于2.6.版本
资源推荐
资源详情
资源评论
收起资源包目录
python解析xml (565个子文件)
xbel_parse.1 1KB
xmlproc_val.1 1KB
xmlproc_parse.1 1KB
ns_parse.1 916B
msie_parse.1 914B
lynx_parse.1 911B
adr_parse.1 907B
xbel2html.1 844B
ANNOUNCE 2KB
Extensions.api 21KB
Ranges.api 2KB
xbel.bib 4KB
xmlparse.c 185KB
pyexpat.c 61KB
xmltok_impl.c 43KB
sgmlop.c 43KB
xmltok.c 40KB
xmlrole.c 32KB
boolean.c 8KB
xmltok_ns.c 3KB
setup.cfg 110B
ChangeLog 8KB
COPYRIGHT 2KB
CREDITS 4KB
artikler.css 2KB
standard.css 684B
README.dom 2KB
xbel-1.1.dtd 3KB
xbel-1.0.dtd 3KB
xbel-1.0.dtd 3KB
quotations.dtd 1014B
xsa.dtd 627B
addr_book.dtd 414B
xmlval_illformed.dtd 31B
pyexpat.prj.exp 12B
wxval.gif 5KB
basicapi.gif 4KB
cmdline.gif 4KB
expat.h 39KB
xmltok.h 11KB
nametab.h 7KB
xmlrole.h 3KB
expat_external.h 3KB
internal.h 2KB
iasciitab.h 2KB
ascii.h 2KB
latin1tab.h 2KB
utf8tab.h 2KB
asciitab.h 2KB
macconfig.h 1KB
winconfig.h 739B
xmltok_impl.h 661B
test2.htm 0B
Extensions.html 30KB
xmlproc-catalog-doco.html 13KB
xmlproc-doco.html 11KB
xmlproc-dtd-doco.html 9KB
index.html 8KB
xmlproc_tut.html 6KB
xmlproc.html 5KB
xmlproc_dtdparser.html 4KB
xmlproc_cmdline.html 3KB
xmlproc_ns.html 3KB
bigTest.html 2KB
Ranges.html 2KB
xmlproc-license.html 1KB
employee_table.html 1KB
mulit-single.html 421B
single.html 391B
html2html 2KB
MANIFEST.in 932B
LICENCE 15KB
MANIFEST 14KB
4Suite.mo 2KB
4Suite.mo 2KB
4Suite.mo 362B
test.xml.out 1KB
PKG-INFO 319B
de.po 3KB
fr.po 3KB
en_US.po 2KB
pyexpat.prj 54KB
CoreLvl1.py 84KB
minidom.py 64KB
CoreLvl2.py 63KB
trex.py 59KB
test_minidom.py 54KB
Range.py 39KB
XPathParser.py 36KB
expatbuilder.py 36KB
__init__.py 36KB
XPathGrammar.py 34KB
errors.py 33KB
xmlutils.py 32KB
xmldtd.py 28KB
characters.py 26KB
unittest.py 25KB
StylesheetReader.py 25KB
saxutils.py 24KB
TraversalLvl2.py 23KB
共 565 条
- 1
- 2
- 3
- 4
- 5
- 6
资源评论
- 听说2014-12-15很好,简洁的讲解!分也不多
- andywangxi2013-05-20功能很全啊,值得借鉴
lijiovo
- 粉丝: 1
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功