Pygments-0.10.tar.gz资源-CSDN文库

155 浏览量 2024-06-20 23:52:51 上传评论收藏 817KB GZ 举报

共222个文件

py：82个

txt：27个

html：22个

资源推荐

资源详情

资源评论

收起资源包目录

Pygments-0.10.tar.gz （222个子文件）

perlfunc.1 43KB

pygmentize.1 3KB

AUTHORS 766B

test.bas 846B

batchfile.bat 984B

test.boo 1KB

ceval.c 60KB

example.c 46KB

numbers.c 195B

setup.cfg 97B

CHANGES 8KB

apache2.conf 12KB

squid.conf 940B

example.cpp 78KB

test.cs 16KB

epydoc.css 13KB

test.css 793B

dwarf.cw 830B

test.d 5KB

string_delimiters.d 544B

HTML4.dcl 3KB

HTML4.dtd 45KB

HTML4-s.dtd 34KB

HTML4-f.dtd 1007B

classes.dylan 683B

HTMLsym.ent 14KB

HTMLlat1.ent 12KB

HTMLspec.ent 4KB

test.erl 6KB

zmlrpc.f90 27KB

genshitext_example.genshitext 332B

SmallCheck.hs 11KB

lexers.html 92KB

lexerdevelopment.html 52KB

formatters.html 43KB

test.html 35KB

tokens.html 23KB

formatterdevelopment.html 21KB

api.html 20KB

quickstart.html 19KB

changelog.html 17KB

styles.html 15KB

filters.html 15KB

cmdline.html 14KB

filterdevelopment.html 12KB

plugins.html 11KB

installation.html 10KB

unicode.html 9KB

smarty_example.html 9KB

index.html 8KB

moinmoin.html 8KB

authors.html 8KB

integrate.html 8KB

rstdirective.html 7KB

django_sample.html+django 3KB

MANIFEST.in 159B

Intro.java 60KB

test.java 19KB

badcase.java 70B

test.jsp 627B

source.lgt 7KB

DancingSudoku.lhs 15KB

Sudoku.lhs 13KB

LICENSE 2KB

type.lisp 48KB

sources.list 3KB

example.lua 8KB

sample.m 469B

firefox.mak 12KB

python25-bsd.mak 7KB

Makefile 37KB

Makefile 2KB

simple.md 11KB

format.ml 41KB

test.moo 2KB

example.moo 999B

AlternatingGroup.mu 2KB

test.myt 4KB

not-zip-safe 1B

example.pas 61KB

test.pas 21KB

test.php 17KB

html+php_faulty.php 6B

PKG-INFO 2KB

perl5db.pl 30KB

de.MoinMoin.po 69KB

unistring.py 394KB

_phpbuiltins.py 104KB

compiled.py 52KB

agile.py 48KB

_vimbuiltins.py 39KB

text.py 34KB

templates.py 34KB

other.py 32KB

functional.py 27KB

vim2pygments.py 26KB

web.py 24KB

lexer.py 22KB

html.py 22KB

共 222 条

.. -*- mode: rst -*- ==================== Write your own lexer ==================== If a lexer for your favorite language is missing in the Pygments package, you can easily write your own and extend Pygments. All you need can be found inside the `pygments.lexer` module. As you can read in the `API documentation <api.txt>`_, a lexer is a class that is initialized with some keyword arguments (the lexer options) and that provides a `get_tokens_unprocessed()` method which is given a string or unicode object with the data to parse. The `get_tokens_unprocessed()` method must return an iterator or iterable containing tuples in the form ``(index, token, value)``. Normally you don't need to do this since there are numerous base lexers you can subclass. RegexLexer ========== A very powerful (but quite easy to use) lexer is the `RegexLexer`. This lexer base class allows you to define lexing rules in terms of *regular expressions* for different *states*. States are groups of regular expressions that are matched against the input string at the *current position*. If one of these expressions matches, a corresponding action is performed (normally yielding a token with a specific type), the current position is set to where the last match ended and the matching process continues with the first regex of the current state. Lexer states are kept in a state stack: each time a new state is entered, the new state is pushed onto the stack. The most basic lexers (like the `DiffLexer`) just need one state. Each state is defined as a list of tuples in the form (`regex`, `action`, `new_state`) where the last item is optional. In the most basic form, `action` is a token type (like `Name.Builtin`). That means: When `regex` matches, emit a token with the match text and type `tokentype` and push `new_state` on the state stack. If the new state is ``'#pop'``, the topmost state is popped from the stack instead. (To pop more than one state, use ``'#pop:2'`` and so on.) ``'#push'`` is a synonym for pushing the current state on the stack. The following example shows the `DiffLexer` from the builtin lexers. Note that it contains some additional attributes `name`, `aliases` and `filenames` which aren't required for a lexer. They are used by the builtin lexer lookup functions. .. sourcecode:: python from pygments.lexer import RegexLexer from pygments.token import * class DiffLexer(RegexLexer): name = 'Diff' aliases = ['diff'] filenames = ['*.diff'] tokens = { 'root': [ (r' .*\n', Text), (r'\+.*\n', Generic.Inserted), (r'-.*\n', Generic.Deleted), (r'@.*\n', Generic.Subheading), (r'Index.*\n', Generic.Heading), (r'=.*\n', Generic.Heading), (r'.*\n', Text), ] } As you can see this lexer only uses one state. When the lexer starts scanning the text, it first checks if the current character is a space. If this is true it scans everything until newline and returns the parsed data as `Text` token. If this rule doesn't match, it checks if the current char is a plus sign. And so on. If no rule matches at the current position, the current char is emitted as an `Error` token that indicates a parsing error, and the position is increased by 1. Regex Flags =========== You can either define regex flags in the regex (``r'(?x)foo bar'``) or by adding a `flags` attribute to your lexer class. If no attribute is defined, it defaults to `re.MULTILINE`. For more informations about regular expression flags see the `regular expressions`_ help page in the python documentation. .. _regular expressions: http://docs.python.org/lib/re-syntax.html Scanning multiple tokens at once ================================ Here is a more complex lexer that highlights INI files. INI files consist of sections, comments and key = value pairs: .. sourcecode:: python from pygments.lexer import RegexLexer, bygroups from pygments.token import * class IniLexer(RegexLexer): name = 'INI' aliases = ['ini', 'cfg'] filenames = ['*.ini', '*.cfg'] tokens = { 'root': [ (r'\s+', Text), (r';.*?$', Comment), (r'\[.*?\]$', Keyword), (r'(.*?)(\s*)(=)(\s*)(.*?)$', bygroups(Name.Attribute, Text, Operator, Text, String)) ] } The lexer first looks for whitespace, comments and section names. And later it looks for a line that looks like a key, value pair, seperated by an ``'='`` sign, and optional whitespace. The `bygroups` helper makes sure that each group is yielded with a different token type. First the `Name.Attribute` token, then a `Text` token for the optional whitespace, after that a `Operator` token for the equals sign. Then a `Text` token for the whitespace again. The rest of the line is returned as `String`. Note that for this to work, every part of the match must be inside a capturing group (a ``(...)``), and there must not be any nested capturing groups. If you nevertheless need a group, use a non-capturing group defined using this syntax: ``r'(?:some|words|here)'`` (note the ``?:`` after the beginning parenthesis). If you find yourself needing a capturing group inside the regex which shouldn't be part of the output but is used in the regular expressions for backreferencing (eg: ``r'(<(foo|bar)>)(.*?)(</\2>)'``), you can pass `None` to the bygroups function and it will skip that group will be skipped in the output. Changing states =============== Many lexers need multiple states to work as expected. For example, some languages allow multiline comments to be nested. Since this is a recursive pattern it's impossible to lex just using regular expressions. Here is the solution: .. sourcecode:: python from pygments.lexer import RegexLexer from pygments.token import * class ExampleLexer(RegexLexer): name = 'Example Lexer with states' tokens = { 'root': [ (r'[^/]+', Text), (r'/\*', Comment.Multiline, 'comment'), (r'//.*?$', Comment.Singleline), (r'/', Text) ], 'comment': [ (r'[^*/]', Comment.Multiline), (r'/\*', Comment.Multiline, '#push'), (r'\*/', Comment.Multiline, '#pop'), (r'[*/]', Comment.Multiline) ] } This lexer starts lexing in the ``'root'`` state. It tries to match as much as possible until it finds a slash (``'/'``). If the next character after the slash is a star (``'*'``) the `RegexLexer` sends those two characters to the output stream marked as `Comment.Multiline` and continues parsing with the rules defined in the ``'comment'`` state. If there wasn't a star after the slash, the `RegexLexer` checks if it's a singleline comment (eg: followed by a second slash). If this also wasn't the case it must be a single slash (the separate regex for a single slash must also be given, else the slash would be marked as an error token). Inside the ``'comment'`` state, we do the same thing again. Scan until the lexer finds a star or slash. If it's the opening of a multiline comment, push the ``'comment'`` state on the stack and continue scanning, again in the ``'comment'`` state. Else, check if it's the end of the multiline comment. If yes, pop one state from the stack. Note: If you pop from an empty stack you'll get an `IndexError`. (There is an easy way to prevent this from happening: don't ``'#pop'`` in the root state). If the `RegexLexer` encounters a newline that is flagged as an error token, the stack is emptied and the lexer continues scanning in the ``'root'`` state. This helps producing error-tolerant highlighting for erroneous input, e.g. when a single-line string is not closed. Advanced state tricks ===================== There are a few m

评论收藏

内容反馈