Pygments-1.6.tar.gz资源-CSDN文库

148 浏览量 2024-06-08 23:43:41 上传评论收藏 1.36MB GZ 举报

共417个文件

py：106个

txt：31个

html：24个

资源推荐

资源详情

资源评论

收起资源包目录

Pygments-1.6.tar.gz （417个子文件）

perlfunc.1 43KB

pygmentize.1 3KB

test.adb 5KB

demo.ahk 4KB

antlr_throws 30B

pppoe.applescript 310B

unicode.applescript 111B

as3_test.as 4KB

as3_test2.as 1KB

as3_test3.as 91B

nasm_aoutso.asm 3KB

nasm_objexe.asm 559B

example2.aspx 636B

aspx-cs_example 586B

test.asy 10KB

autoit_submit.au3 692B

AUTHORS 5KB

autopygmentize 2KB

test.awk 4KB

test.bas 846B

pygments.bashcomp 1KB

batchfile.bat 984B

test.bmx 2KB

test.boo 1KB

test.bro 7KB

example.bug 2KB

ceval.c 60KB

example.c 46KB

numbers.c 195B

ca65_example 5KB

Config.in.cache 64KB

cbmbas_example 319B

example.ceylon 887B

setup.cfg 97B

demo.cfm 724B

CHANGES 22KB

genclass.clj 21KB

escape_semicolon.clj 46B

example.cls 318B

main.cmake 2KB

example.cob 150KB

underscore.coffee 19KB

apache2.conf 12KB

nginx_nginx.conf 3KB

squid.conf 951B

lighttpd_config.conf 556B

coq_RelationClasses 15KB

example.cpp 78KB

test.cs 17KB

epydoc.css 13KB

test.css 933B

webkit-transition.css 49B

test.cu 776B

dwarf.cw 830B

test.d 5KB

string_delimiters.d 544B

test.dart 591B

HTML4.dcl 3KB

inet_pton6.dg 3KB

HTML4.dtd 45KB

HTML4-s.dtd 34KB

test.dtd 2KB

HTML4-f.dtd 1007B

classes.dylan 3KB

session.dylan-console 142B

test.ec 19KB

test.ecl 3KB

test.eh 7KB

HTMLsym.ent 14KB

HTMLlat1.ent 12KB

HTMLspec.ent 4KB

test.erl 6KB

erl_session 145B

test.evoque 920B

example_elixir.ex 10KB

zmlrpc.f90 27KB

wiki.factor 10KB

test.fan 25KB

test.flx 915B

glsl.frag 159B

intsyn.fun 22KB

example_file.fy 3KB

ANTLRv3.g 13KB

test.gdc 655B

genshitext_example.genshitext 332B

test.groovy 2KB

example.gs 3KB

example.gst 193B

SmallCheck.hs 11KB

AcidStateAdvanced.hs 8KB

import.hs 78B

lexers.html 208KB

lexerdevelopment.html 50KB

formatters.html 46KB

test.html 35KB

changelog.html 35KB

tokens.html 22KB

formatterdevelopment.html 20KB

api.html 20KB

quickstart.html 19KB

共 417 条

.. -*- mode: rst -*- ==================== Write your own lexer ==================== If a lexer for your favorite language is missing in the Pygments package, you can easily write your own and extend Pygments. All you need can be found inside the `pygments.lexer` module. As you can read in the `API documentation <api.txt>`_, a lexer is a class that is initialized with some keyword arguments (the lexer options) and that provides a `get_tokens_unprocessed()` method which is given a string or unicode object with the data to parse. The `get_tokens_unprocessed()` method must return an iterator or iterable containing tuples in the form ``(index, token, value)``. Normally you don't need to do this since there are numerous base lexers you can subclass. RegexLexer ========== A very powerful (but quite easy to use) lexer is the `RegexLexer`. This lexer base class allows you to define lexing rules in terms of *regular expressions* for different *states*. States are groups of regular expressions that are matched against the input string at the *current position*. If one of these expressions matches, a corresponding action is performed (normally yielding a token with a specific type), the current position is set to where the last match ended and the matching process continues with the first regex of the current state. Lexer states are kept in a state stack: each time a new state is entered, the new state is pushed onto the stack. The most basic lexers (like the `DiffLexer`) just need one state. Each state is defined as a list of tuples in the form (`regex`, `action`, `new_state`) where the last item is optional. In the most basic form, `action` is a token type (like `Name.Builtin`). That means: When `regex` matches, emit a token with the match text and type `tokentype` and push `new_state` on the state stack. If the new state is ``'#pop'``, the topmost state is popped from the stack instead. (To pop more than one state, use ``'#pop:2'`` and so on.) ``'#push'`` is a synonym for pushing the current state on the stack. The following example shows the `DiffLexer` from the builtin lexers. Note that it contains some additional attributes `name`, `aliases` and `filenames` which aren't required for a lexer. They are used by the builtin lexer lookup functions. .. sourcecode:: python from pygments.lexer import RegexLexer from pygments.token import * class DiffLexer(RegexLexer): name = 'Diff' aliases = ['diff'] filenames = ['*.diff'] tokens = { 'root': [ (r' .*\n', Text), (r'\+.*\n', Generic.Inserted), (r'-.*\n', Generic.Deleted), (r'@.*\n', Generic.Subheading), (r'Index.*\n', Generic.Heading), (r'=.*\n', Generic.Heading), (r'.*\n', Text), ] } As you can see this lexer only uses one state. When the lexer starts scanning the text, it first checks if the current character is a space. If this is true it scans everything until newline and returns the parsed data as `Text` token. If this rule doesn't match, it checks if the current char is a plus sign. And so on. If no rule matches at the current position, the current char is emitted as an `Error` token that indicates a parsing error, and the position is increased by 1. Regex Flags =========== You can either define regex flags in the regex (``r'(?x)foo bar'``) or by adding a `flags` attribute to your lexer class. If no attribute is defined, it defaults to `re.MULTILINE`. For more informations about regular expression flags see the `regular expressions`_ help page in the python documentation. .. _regular expressions: http://docs.python.org/lib/re-syntax.html Scanning multiple tokens at once ================================ Here is a more complex lexer that highlights INI files. INI files consist of sections, comments and key = value pairs: .. sourcecode:: python from pygments.lexer import RegexLexer, bygroups from pygments.token import * class IniLexer(RegexLexer): name = 'INI' aliases = ['ini', 'cfg'] filenames = ['*.ini', '*.cfg'] tokens = { 'root': [ (r'\s+', Text), (r';.*?$', Comment), (r'\[.*?\]$', Keyword), (r'(.*?)(\s*)(=)(\s*)(.*?)$', bygroups(Name.Attribute, Text, Operator, Text, String)) ] } The lexer first looks for whitespace, comments and section names. And later it looks for a line that looks like a key, value pair, separated by an ``'='`` sign, and optional whitespace. The `bygroups` helper makes sure that each group is yielded with a different token type. First the `Name.Attribute` token, then a `Text` token for the optional whitespace, after that a `Operator` token for the equals sign. Then a `Text` token for the whitespace again. The rest of the line is returned as `String`. Note that for this to work, every part of the match must be inside a capturing group (a ``(...)``), and there must not be any nested capturing groups. If you nevertheless need a group, use a non-capturing group defined using this syntax: ``r'(?:some|words|here)'`` (note the ``?:`` after the beginning parenthesis). If you find yourself needing a capturing group inside the regex which shouldn't be part of the output but is used in the regular expressions for backreferencing (eg: ``r'(<(foo|bar)>)(.*?)(</\2>)'``), you can pass `None` to the bygroups function and it will skip that group will be skipped in the output. Changing states =============== Many lexers need multiple states to work as expected. For example, some languages allow multiline comments to be nested. Since this is a recursive pattern it's impossible to lex just using regular expressions. Here is the solution: .. sourcecode:: python from pygments.lexer import RegexLexer from pygments.token import * class ExampleLexer(RegexLexer): name = 'Example Lexer with states' tokens = { 'root': [ (r'[^/]+', Text), (r'/\*', Comment.Multiline, 'comment'), (r'//.*?$', Comment.Singleline), (r'/', Text) ], 'comment': [ (r'[^*/]', Comment.Multiline), (r'/\*', Comment.Multiline, '#push'), (r'\*/', Comment.Multiline, '#pop'), (r'[*/]', Comment.Multiline) ] } This lexer starts lexing in the ``'root'`` state. It tries to match as much as possible until it finds a slash (``'/'``). If the next character after the slash is a star (``'*'``) the `RegexLexer` sends those two characters to the output stream marked as `Comment.Multiline` and continues parsing with the rules defined in the ``'comment'`` state. If there wasn't a star after the slash, the `RegexLexer` checks if it's a singleline comment (eg: followed by a second slash). If this also wasn't the case it must be a single slash (the separate regex for a single slash must also be given, else the slash would be marked as an error token). Inside the ``'comment'`` state, we do the same thing again. Scan until the lexer finds a star or slash. If it's the opening of a multiline comment, push the ``'comment'`` state on the stack and continue scanning, again in the ``'comment'`` state. Else, check if it's the end of the multiline comment. If yes, pop one state from the stack. Note: If you pop from an empty stack you'll get an `IndexError`. (There is an easy way to prevent this from happening: don't ``'#pop'`` in the root state). If the `RegexLexer` encounters a newline that is flagged as an error token, the stack is emptied and the lexer continues scanning in the ``'root'`` state. This helps producing error-tolerant highlighting for erroneous input, e.g. when a single-line string is not closed. Advanced state tricks ===================== There are a few m

评论收藏

内容反馈