没有合适的资源?快使用搜索试试~ 我知道了~
Getting Started with Pyparsing
5星 · 超过95%的资源 需积分: 10 75 下载量 72 浏览量
2009-03-12
16:46:47
上传
评论 2
收藏 624KB PDF 举报
温馨提示
试读
65页
Pyparsing is:100 percent pure Python—no compiled dynamic link libraries (DLLs) or shared libraries are used in pyparsing, so you can use it on any platform that is Python compatible.
资源推荐
资源详情
资源评论
Getting
Started with
Pyparsing
by Paul McGuire
Copyright © 2008 O'Reilly Media, Inc.
ISBN: 9780596514235
Released: October 4, 2007
Need to extract data from a text file or a
web page? Or do you want to make your
application more flexible with user-de-
fined commands or search strings? Do
regular expressions and lex/yacc make
your eyes blur and your brain hurt?
Pyparsing could be the solution. Pypars-
ing is a pure-Python class library that
makes it easy to build recursive-descent
parsers quickly. There is no need to
handcraft your own parsing state ma-
chine. With pyparsing, you can quickly
create HTML page scrapers, logfile data
extractors, or complex data structure or
command processors. This Short Cut
shows you how!
Contents
What Is Pyparsing? ......................... 3
Basic Form of a Pyparsing
Program ......................................... 5
"Hello, World!" on Steroids! ........... 9
What Makes Pyparsing So
Special? ........................................ 14
Parsing Data from a Table—Using
Parse Actions and ParseResults ..... 17
Extracting Data from a Web
Page ............................................. 26
A Simple S-Expression Parser ....... 35
A Complete S-Expression
Parser .......................................... 38
Parsing a Search String ................. 48
Search Engine in 100 Lines of
Code ............................................ 53
Conclusion .................................. 62
Index ........................................... 63
Find more at shortcuts.oreilly.com
"I need to analyze this logfile..."
"Just extract the data from this web page..."
"We need a simple input command processor..."
"Our source code needs to be migrated to the new API..."
Each of these everyday requests generates the same reflex response in any developer
faced with them: "Oh, *&#$*!, not another parser!"
The task of parsing data from loosely formatted text crops up in different forms
on a regular basis for most developers. Sometimes it is for one-off development
utilities, such as the API-upgrade example, that are purely for internal use. Other
times, the parsing task is a user-interface function to be built in to a command-
driven application.
If you are working in Python, you can tackle many of these jobs using Python's
built-in string methods, most notably split(), index(), and startswith().
What makes parser writing unpleasant are those jobs that go beyond simple string
splitting and indexing, with some context-varying format, or a structure defined
to a language syntax rather than a simple character pattern. For instance,
y = 2 * x + 10
is easy to parse when split on separating spaces. Unfortunately, few users are so
careful in their spacing, and an arithmetic expression parser is just as likely to
encounter any of these:
y = 2*x + 10
y = 2*x+10
y=2*x+10
Splitting this last expression on whitespace just returns the original string, with no
further insight into the separate elements y, =, 2, etc.
The traditional tools for developing parsing utilities that are beyond processing
with just str.split are regular expressions and lex/yacc. Regular expressions use
a text string to describe a text pattern to be matched. The text string uses special
characters (such as |, +, ., *, and ?) to denote various parsing concepts such as
alternation, repetition, and wildcards. Lex and yacc are utilities that lexically detect
token boundaries, and then apply processing code to the extracted tokens. Lex
and yacc use a separate token-definition file, and then generate lexing and token-
processing code templates for the programmer to extend with application-specific
behavior.
Getting Started with Pyparsing 2
Historical note
These technologies were originally developed as text-processing and compiler
generation utilities in C in the 1970s, and they continue to be in wide use
today. The Python distribution includes regular expression support with the
re module, part of its "batteries included" standard library. You can download
a number of freely available parsing modules that perform lex/yacc-style pars-
ing ported to Python.
The problem in using these traditional tools is that each introduces its own spe-
cialized notation, which must then be mapped to a Python design and Python code.
In the case of lex/yacc-style tools, a separate code-generation step is usually re-
quired.
In practice, parser writing often takes the form of a seemingly endless cycle: write
code, run parser on sample text, encounter unexpected text input case, modify
code, rerun modified parser, find additional "special" case, etc. Combined with
the notation issues of regular expressions, or the extra code-generation steps of
lex/yacc, this cyclic process can spiral into frustration.
What Is Pyparsing?
Pyparsing is a pure Python module that you can add to your Python application
with little difficulty. Pyparsing's class library provides a set of classes for building
up a parser from individual expression elements, up to complex, variable-syntax
expressions. Expressions are combined using intuitive operators, such as + for se-
quentially adding one expression after another, and | and ^ for defining parsing
alternatives (meaning "match first alternative" or "match longest alternative").
Replication of expressions is added using classes such as OneOrMore, ZeroOrMore,
and Optional.
For example, a regular expression that would parse an IP address followed by a
U.S.-style phone number might look like the following:
(\d{1,3}(?:\.\d{1,3}){3})\s+(\(\d{3}\)\d{3}-\d{4})
In contrast, the same expression using pyparsing might be written as follows:
ipField = Word(nums, max=3)
ipAddr = Combine( ipField + "." + ipField + "." + ipField + "." + ipField )
phoneNum = Combine( "(" + Word(nums, exact=3) + ")" +
Word(nums, exact=3) + "−" + Word(nums, exact=4) )
userdata = ipAddr + phoneNum
Getting Started with Pyparsing 3
Although it is more verbose, the pyparsing version is clearly more readable; it
would be much easier to go back and update this version to handle international
phone numbers, for example.
New to Python?
I have gotten many emails from people who were writing a pyparsing appli-
cation for their first Python program. They found pyparsing to be easy to pick
up, usually by adapting one of the example scripts that is included with py-
parsing. If you are just getting started with Python, you may feel a bit lost going
through some of the examples. Pyparsing does not require much advanced
Python knowledge, so it is easy to come up to speed quickly. There are a
number of online tutorial resources, starting with the Python web site,
www.python.org [http://www.python.org].
To make the best use of pyparsing, you should become familiar with the basic
Python language features of indented syntax, data types, and for item in
itemSequence: control structures.
Pyparsing makes use of object.attribute notation, as well as Python's built-
in container classes, tuple, list, and dict.
The examples in this book use Python lambdas, which are essentially one-line
functions; lambdas are especially useful when defining simple parse actions.
The list comprehension and generator expression forms of iteration are useful
when extracting tokens from parsed results, but not required.
Pyparsing is:
• 100 percent pure Python—no compiled dynamic link libraries (DLLs) or shared
libraries are used in pyparsing, so you can use it on any platform that is Python
2.3-compatible.
• Driven by parsing expressions that are coded inline, using standard Python
class notation and constructs —no separate code generation process and no
specialized character notation make your application easier to develop, un-
derstand, and maintain.
Getting Started with Pyparsing 4
• Enhanced with helpers for common parsing patterns:
• C, C++, Java, Python, and HTML comments
• quoted strings (using single or double quotes, with \' or \" escapes)
• HTML and XML tags (including upper-/lowercase and tag attribute han-
dling)
• comma-separated values and delimited lists of arbitrary expressions
• Small in footprint—Pyparsing's code is contained within a single Python source
file, easily dropped into a site-packages directory, or included with your own
application.
• Liberally licensed—MIT license permits any use, commercial or non-commer-
cial.
Basic Form of a Pyparsing Program
The prototypical pyparsing program has the following structure:
• Import names from pyparsing module
• Define grammar using pyparsing classes and helper methods
• Use the grammar to parse the input text
• Process the results from parsing the input text
Import Names from Pyparsing
In general, using the form from pyparsing import * is discouraged among Python
style experts. It pollutes the local variable namespace with an unknown number
of new names from the imported module. However, during pyparsing grammar
development, it is hard to anticipate all of the parser element types and other py-
parsing-defined names that will be needed, and this form simplifies early grammar
development. After the grammar is mostly finished, you can go back to this state-
ment and replace the * with the list of pyparsing names that you actually used.
Define the Grammar
The grammar is your definition of the text pattern that you want to extract from
the input text. With pyparsing, the grammar takes the form of one or more Python
statements that define text patterns, and combinations of patterns, using pyparsing
classes and helpers to specify these individual pieces. Pyparsing allows you to use
operators such as +, |, and ^ to simplify this code. For instance, if I use the pyparsing
Word class to define a typical programming variable name consisting of a leading
Getting Started with Pyparsing 5
剩余64页未读,继续阅读
资源评论
- wsyk19842016-06-13很强大的工具,谢谢楼主,资料很好!
- tzyu652014-03-20不错的书,几十页的小册子,把库的用法讲的很清楚。
- susanjane2015-06-17正好需要用pyparsing实现基于python的dsl 非常感谢
- Lintaodasuaige2018-03-20清晰,不过是2007年的书了,后续版本有更新到2017年
- Instein982019-06-25有书签,非扫描,很清晰,是正版。
digiinfo
- 粉丝: 0
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功