Micro Parser Combinators
========================
Version 0.9.0
About
-----
_mpc_ is a lightweight and powerful Parser Combinator library for C.
Using _mpc_ might be of interest to you if you are...
* Building a new programming language
* Building a new data format
* Parsing an existing programming language
* Parsing an existing data format
* Embedding a Domain Specific Language
* Implementing [Greenspun's Tenth Rule](http://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule)
Features
--------
* Type-Generic
* Predictive, Recursive Descent
* Easy to Integrate (One Source File in ANSI C)
* Automatic Error Message Generation
* Regular Expression Parser Generator
* Language/Grammar Parser Generator
Alternatives
------------
The current main alternative for a C based parser combinator library is a branch of [Cesium3](https://github.com/wbhart/Cesium3/tree/combinators).
_mpc_ provides a number of features that this project does not offer, and also overcomes a number of potential downsides:
* _mpc_ Works for Generic Types
* _mpc_ Doesn't rely on Boehm-Demers-Weiser Garbage Collection
* _mpc_ Doesn't use `setjmp` and `longjmp` for errors
* _mpc_ Doesn't pollute the namespace
Quickstart
==========
Here is how one would use _mpc_ to create a parser for a basic mathematical expression language.
```c
mpc_parser_t *Expr = mpc_new("expression");
mpc_parser_t *Prod = mpc_new("product");
mpc_parser_t *Value = mpc_new("value");
mpc_parser_t *Maths = mpc_new("maths");
mpca_lang(MPCA_LANG_DEFAULT,
" expression : <product> (('+' | '-') <product>)*; "
" product : <value> (('*' | '/') <value>)*; "
" value : /[0-9]+/ | '(' <expression> ')'; "
" maths : /^/ <expression> /$/; ",
Expr, Prod, Value, Maths, NULL);
mpc_result_t r;
if (mpc_parse("input", input, Maths, &r)) {
mpc_ast_print(r.output);
mpc_ast_delete(r.output);
} else {
mpc_err_print(r.error);
mpc_err_delete(r.error);
}
mpc_cleanup(4, Expr, Prod, Value, Maths);
```
If you were to set `input` to the string `(4 * 2 * 11 + 2) - 5`, the printed output would look like this.
```
>
regex
expression|>
value|>
char:1:1 '('
expression|>
product|>
value|regex:1:2 '4'
char:1:4 '*'
value|regex:1:6 '2'
char:1:8 '*'
value|regex:1:10 '11'
char:1:13 '+'
product|value|regex:1:15 '2'
char:1:16 ')'
char:1:18 '-'
product|value|regex:1:20 '5'
regex
```
Getting Started
===============
Introduction
------------
Parser Combinators are structures that encode how to parse particular languages. They can be combined using intuitive operators to create new parsers of increasing complexity. Using these operators detailed grammars and languages can be parsed and processed in a quick, efficient, and easy way.
The trick behind Parser Combinators is the observation that by structuring the library in a particular way, one can make building parser combinators look like writing a grammar itself. Therefore instead of describing _how to parse a language_, a user must only specify _the language itself_, and the library will work out how to parse it ... as if by magic!
_mpc_ can be used in this mode, or, as shown in the above example, you can specify the grammar directly as a string or in a file.
Basic Parsers
-------------
### String Parsers
All the following functions construct new basic parsers of the type `mpc_parser_t *`. All of those parsers return a newly allocated `char *` with the character(s) they manage to match. If unsuccessful they will return an error. They have the following functionality.
* * *
```c
mpc_parser_t *mpc_any(void);
```
Matches any individual character
* * *
```c
mpc_parser_t *mpc_char(char c);
```
Matches a single given character `c`
* * *
```c
mpc_parser_t *mpc_range(char s, char e);
```
Matches any single given character in the range `s` to `e` (inclusive)
* * *
```c
mpc_parser_t *mpc_oneof(const char *s);
```
Matches any single given character in the string `s`
* * *
```c
mpc_parser_t *mpc_noneof(const char *s);
```
Matches any single given character not in the string `s`
* * *
```c
mpc_parser_t *mpc_satisfy(int(*f)(char));
```
Matches any single given character satisfying function `f`
* * *
```c
mpc_parser_t *mpc_string(const char *s);
```
Matches exactly the string `s`
### Other Parsers
Several other functions exist that construct parsers with some other special functionality.
* * *
```c
mpc_parser_t *mpc_pass(void);
```
Consumes no input, always successful, returns `NULL`
* * *
```c
mpc_parser_t *mpc_fail(const char *m);
mpc_parser_t *mpc_failf(const char *fmt, ...);
```
Consumes no input, always fails with message `m` or formatted string `fmt`.
* * *
```c
mpc_parser_t *mpc_lift(mpc_ctor_t f);
```
Consumes no input, always successful, returns the result of function `f`
* * *
```c
mpc_parser_t *mpc_lift_val(mpc_val_t *x);
```
Consumes no input, always successful, returns `x`
* * *
```c
mpc_parser_t *mpc_state(void);
```
Consumes no input, always successful, returns a copy of the parser state as a `mpc_state_t *`. This state is newly allocated and so needs to be released with `free` when finished with.
* * *
```c
mpc_parser_t *mpc_anchor(int(*f)(char,char));
```
Consumes no input. Successful when function `f` returns true. Always returns `NULL`.
Function `f` is a _anchor_ function. It takes as input the last character parsed, and the next character in the input, and returns success or failure. This function can be set by the user to ensure some condition is met. For example to test that the input is at a boundary between words and non-words.
At the start of the input the first argument is set to `'\0'`. At the end of the input the second argument is set to `'\0'`.
Parsing
-------
Once you've build a parser, you can run it on some input using one of the following functions. These functions return `1` on success and `0` on failure. They output either the result, or an error to a `mpc_result_t` variable. This type is defined as follows.
```c
typedef union {
mpc_err_t *error;
mpc_val_t *output;
} mpc_result_t;
```
where `mpc_val_t *` is synonymous with `void *` and simply represents some pointer to data - the exact type of which is dependant on the parser.
* * *
```c
int mpc_parse(const char *filename, const char *string, mpc_parser_t *p, mpc_result_t *r);
```
Run a parser on some string.
* * *
```c
int mpc_parse_file(const char *filename, FILE *file, mpc_parser_t *p, mpc_result_t *r);
```
Run a parser on some file.
* * *
```c
int mpc_parse_pipe(const char *filename, FILE *pipe, mpc_parser_t *p, mpc_result_t *r);
```
Run a parser on some pipe (such as `stdin`).
* * *
```c
int mpc_parse_contents(const char *filename, mpc_parser_t *p, mpc_result_t *r);
```
Run a parser on the contents of some file.
Combinators
-----------
Combinators are functions that take one or more parsers and return a new parser of some given functionality.
These combinators work independently of exactly what data type the parser(s) supplied as input return. In languages such as Haskell ensuring you don't input one type of data into a parser requiring a different type is done by the compiler. But in C we don't have that luxury. So it is at the discretion of the programmer to ensure that he or she deals correctly with the outputs of different parser types.
A second annoyance in C is that of manual memory management. Some parsers might get half-way and then fail. This means they need to clean up any partial result that has been collected in the parse. In Haskell this is handled by the Garbage Collector, but in C the