<img src="https://user-images.githubusercontent.com/5255209/117226360-7e69a900-ade2-11eb-9127-4a146a443199.png" alt="nmfu logo banner" width="100%"/>
# nmfu
---
_the "no memory for you" "parser" generator_
---
 [](https://pypi.org/project/nmfu)  [](https://jenkins.mm12.xyz/job/nmfu) [](https://jenkins.mm12.xyz/jenkins/job/nmfu/job/master/lastCompletedBuild/testReport/) [](https://jenkins.mm12.xyz/jenkins/job/nmfu/job/master/lastCompletedBuild/coverage/cobertura__coverage_xml/project/_/nmfu_py/) [](https://snapcraft.io/nmfu)
`nmfu` attempts to turn a parser specified as a procedural matching thing into a state machine. It's much more obvious what it
does if you read some of the examples.
It takes in a "program" containing various match expressions, control structures and actions and converts it into a DFA with actions
on the transitions. This allows simple protocols (for example HTTP) to be parsed using an extremely small memory footprint and without
requiring a separate task since it can be done character by character.
You can also define various output variables which can be manipulated inside the parser program, which can then be examined after parsing.
See `example/http.nmfu` for a good example of using this functionality.
The rest of this README is a guide to using NMFU.
## Parser Specification
NMFU source files support C++-style line comments (text after `//` on a line is ignored until the end of the line)
### Top-Level Constructs
At the top-level, all NMFU parsers consist of a set of output-variables, macro-definitions, hook-definitions and the parser code itself.
The output-variables are specified with the _output-declaration_:
```lark
out_decl: "out" out_type IDENTIFIER ";"
| "out" out_type IDENTIFIER "=" atom ";"
out_type: "bool" -> bool_type
| "int" -> int_type
| "enum" "{" IDENTIFIER ("," IDENTIFIER)+ "}" -> enum_type
| "str" "[" NUMBER "]" -> str_type
```
For example:
```
out int content_length = 32;
out bool supports_gzip = false;
// note you can't set default values for strings and enums
out str[32] url;
out enum{GET,POST} method;
```
All strings have a defined maximum size, which includes the null-terminator.
Macros in NMFU are simple parse-tree level replacements. They look like:
```lark
macro_decl: "macro" IDENTIFIER macro_args "{" statement* "}"
macro_args: "(" macro_arg ("," macro_arg)* ")"
| "(" ")" -> macro_arg_empty
macro_arg: "macro" IDENTIFIER -> macro_macro_arg
| "out" IDENTIFIER -> macro_out_arg
| "match" IDENTIFIER -> macro_match_expr_arg
| "expr" IDENTIFIER -> macro_int_expr_arg
| "hook" IDENTIFIER -> macro_hook_arg
| "loop" IDENTIFIER -> macro_breaktgt_arg
```
For example:
```
macro ows() { // optional white space
optional {
" ";
}
}
```
When macros are "called", or instantiated, all NMFU does is copy the contents of the parse tree from the macro
declaration to the call-site. Note that although macros can call other macros, they cannot recurse.
Macros can also take arguments, which are similarly treated as parse-tree level replacements, with the added restriction
that their types _are_ checked. For example:
```
macro read_number(out target, match delimit) {
target = 0;
foreach {
/\d+/;
} do {
target = [target * 10 + ($last - '0')];
}
delimit;
}
```
There are 6 types of arguments:
- `macro`: a reference to another macro
- `hook`: a reference to a hook
- `out`: a reference to an _output-variable_
- `match`: an arbitrary _match-expression_
- `expr`: an arbitrary _integer-expression_
- `loop`: an arbitrary named _loop-statement_, for use in _break-statements_.
Hooks (which are callbacks to user code which the parser can call at certain points) are defined with a _hook-declaration_:
```lark
hook_decl: "hook" IDENTIFIER ";"
```
For example:
```
hook got_header;
```
### Parser Declaration
The parser proper is declared with the _parser-declaration_,
```lark
parser_decl: "parser" "{" statement+ "}"
```
and contains a set of statements which are "executed" in order to parse the input.
### Basic Statements
Basic statements are statements which do not have an associated code block, and which end with a semicolon.
```lark
simple_stmt: expr -> match_stmt
| IDENTIFIER "=" expr -> assign_stmt
| IDENTIFIER "+=" expr -> append_stmt
| IDENTIFIER "(" (expr ("," expr)*)? ")" -> call_stmt
| "break" IDENTIFIER? -> break_stmt
| "finish" -> finish_stmt
| "wait" expr -> wait_stmt
```
The most basic form of statement in NMFU is a _match-statement_, which matches any _match-expression_ (explained in the next section).
The next two statements are the _assign-statement_ and _append_statement_. The _assign-statement_ parses an _integer-expression_ (which are not limited to just integers, again explained in the next section).
and assigns its result into the named _output-variable_. The _append-statement_ instead appends whatever is matched by the _match-expression_ into the named _output-variable_ which must by a string type. Additionally,
if the argument to an _append-statement_ is a _math-expression_, then the result of evaluating the expression will be treated as a character code and appended to the string.
The _call-stmt_ instantiates a macro or calls a hook. Note that there is currently no valid way to pass parameters to a hook, and as such the expressions provided
in that case will be ignored. Macro arguments are always parsed as generic expressions and then interpreted according to the type given to them at declaration.
If a hook and macro have the same name, the macro will take priority. Priority is undefined if a macro argument and global hook or macro share a name.
The _break-statement_ is explained along with loops in a later section.
The _finish-statement_ causes the parser to immediately stop and return a `DONE` status code, which should be interpreted by the calling application as a termination condition.
The _wait-statement_ spins and consumes input until the _match-expression_ provided matches successfully. Importantly, no event (including end of input!) can stop the
wait statement, which makes it useful primarily in error handlers.
It is also important to note that this is _not_ the same as using a regex like `/.*someterminator/`, as
the wait statement does _not_ "try" different starting positions for a string when its match fails. More concretely, something like `wait abcdabce` would _not_ match `abcdabcdabce`, as
the statement would bail and restart matching from the beginning at the second `d`.
### Expressions
There are three types of expressions in NMFU, _match-expressions_, _integer-expressions_ and _math-expressions_.
A _match-expression_ is anything that can consume input to the parser and check it:
```lark
?expr: atom // string match
| regex // not an atom to simplify things
| "end" -> end_expr
| "(" expr+ ")" -> concat_expr
atom: STRING "i" -> string_case_const
| STRING -> string_const
```
The simplest form of _match-expression_ is the _direct-match_, which matches a literal string. It can optionally match with case insensitivity by suffixing the literal string with an "i".
The _end-match-expression_ is a match expression which only mat

挣扎的蓝藻
- 粉丝: 14w+
- 资源: 15万+
最新资源
- Vim魔改指南:C语言代码补全、静态检查、一键编译配置.pdf
- VSCode2025终极配置指南:C语言开发环境搭建+调试技巧.pdf
- VSCode+C环境配置终极方案.pdf
- VSCode+C环境配置:一键搭建高效开发环境.pdf
- VSCode配置C语言环境:插件+调试技巧打造高效开发工作流.pdf
- 保研、面试必杀技:C语言高频考点精讲.pdf
- VSCode配置C语言环境:一键搞定编译调试+代码美化.pdf
- 变量作用域陷阱:为什么你的值总被吃掉?.pdf
- 避开C语言初学的15个天坑!华为大佬亲授避雷秘籍.pdf
- 编译错误终结手册:从missingsemicolon到未定义行为.pdf
- 初学必看!C语言语法陷阱大全:从分号到数组越界的避坑手册.pdf
- 彻底搞懂指针:从内存地址到高级应用的10个关键场景.pdf
- 彻底搞懂C语言指针:从内存地址到高级应用的10个实战案例.pdf
- 从C语言到C++、Java:掌握底层思维对学习高级语言的关键作用.pdf
- 从单片机到操作系统:C语言在不同领域的应用全景图.pdf
- 从C到C++:为什么学透C语言是进阶的必经之路?.pdf
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈


