acwj源码注释，c编译器资源-CSDN文库

共26个文件

ipch：6个

c：5个

h：3个

需积分: 10 188 浏览量 2022-09-08 22:37:24 上传评论收藏 3.29MB RAR 举报

《ACWJ源码注释与C编译器解析》在编程竞赛中，ACWJ（也称为ACM或ICPC）是一个常见的在线评测系统，它用于自动评判参赛者提交的程序代码是否能正确解决特定的问题。在这个压缩包中，我们主要关注的是ACWJ的源码注释以及C编译器的相关知识。源码注释是编程过程中不可或缺的一部分，它为代码提供了解释和指导，使得其他开发者能够更容易地理解和维护代码。对于ACWJ这样的系统来说，源码注释尤为重要，因为它是开源项目，需要吸引和帮助社区的开发者共同参与。注释通常包括函数的功能、参数解释、返回值说明以及可能的异常处理等信息，这有助于提升代码的可读性和可维护性。 C编译器是将C语言源代码转换为机器可执行代码的关键工具。在ACWJ系统中，C编译器负责接收用户提交的C语言代码，进行语法分析、语义分析、优化和目标代码生成。这一过程大致可以分为以下几个阶段： 1. **词法分析**：编译器首先将源代码分解成一个个被称为“词法单元”的基本元素，如标识符、关键字、常量、运算符等。 2. **语法分析**：接下来，编译器根据C语言的语法规则，将词法单元组合成抽象语法树（AST）。在这个阶段，编译器会检查代码是否符合语法规则，例如，括号是否匹配，表达式是否合法等。 3. **语义分析**：此阶段编译器检查代码的逻辑意义，比如类型检查、变量声明和作用域等。如果发现违反语义规则，如类型不匹配、未声明的变量等，编译器会产生错误。 4. **中间代码生成**：为了提高代码的可移植性，编译器通常会生成一种高级中间表示（如三地址码），以便于后续的优化和目标代码生成。 5. **代码优化**：编译器会尝试改进中间代码，以提高程序的运行效率，例如删除冗余计算、合并相似操作等。 6. **目标代码生成**：编译器将中间代码转换为特定平台的机器指令，形成可执行文件。在ACWJ系统中，C编译器的实现可能还包括错误处理机制，以处理用户代码中的错误，并向用户提供有用的反馈。此外，系统可能还需要支持多语言编译，如C++、Java等，这就需要有相应的编译器或者前端。通过研究ACWJ的源码注释，我们可以深入理解其内部工作原理，学习如何设计和实现一个在线评测系统，这对于提升编程技能和对编译原理的理解都大有裨益。同时，这也是对开源社区的贡献，因为你可以为改善系统提出建议，甚至提交代码，使ACWJ更加完善。

资源详情

资源评论

资源推荐

收起资源包目录

02_Parser.rar （26个子文件）

02_Parser

.vs

ProjectSettings.json 44B

VSWorkspaceState.json 111B

02_Parser

v16

Browse.VC.db 1.84MB

.suo 50KB

ipch

AutoPCH

e36ece1efaab002e

INTERP.ipch 3MB

36c32d0561cc93a1

DEFS.ipch 3.63MB

ed50c7da12d84dd1

MAIN.ipch 3MB

b21227d6383ddc17

EXPR.ipch 3MB

7146c46c2ef244e4

TREE.ipch 3MB

4a1eae07c90cd4d1

f6a1602eda3eab75

efdc22140df56819

SCAN.ipch 3MB

slnx.sqlite 88KB

data.h 274B

main.c 1KB

input01 17B

input05 15B

input04 23B

interp.c 1KB

Readme.md 15KB

expr.c 2KB

decl.h 511B

input03 27B

Makefile 125B

defs.h 1KB

scan.c 3KB

input02 28B

tree.c 986B

# Part 2: Introduction to Parsing In this part of our compiler writing journey, I'm going to introduce the basics of a parser. As I mentioned in the first part, the job of the parser is to recognise the syntax and structural elements of the input and ensure that they conform to the *grammar* of the language. We already have several language elements that we can scan in, i.e. our tokens: + the four basic maths operators: `*`, `/`, `+` and `-` + decimal whole numbers which have 1 or more digits `0` .. `9` Now let's define a grammar for the language that our parser will recognise. ## BNF: Backus-Naur Form You will come across the use of [BNF ](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) at some point if you get into dealing with computer languages. I will just introduce enough of the BNF syntax here to express the grammar we want to recognise. We want a grammar to express maths expressions with whole numbers. Here is the BNF description of the grammar: ``` expression: number | expression '*' expression | expression '/' expression | expression '+' expression | expression '-' expression ; number: T_INTLIT ; ``` The vertical bars separate options in the grammar, so the above says: + An expression could be just a number, or + An expression is two expressions separated by a '*' token, or + An expression is two expressions separated by a '/' token, or + An expression is two expressions separated by a '+' token, or + An expression is two expressions separated by a '-' token + A number is always a T_INTLIT token It should be pretty obvious that the BNF definition of the grammar is *recursive*: an expression is defined by referencing other expressions. But there is a way to *bottom-out" the recursion: when an expression turns out to be a number, this is always a T_INTLIT token and thus not recursive. In BNF, we say that "expression" and "number" are *non-terminal* symbols, as they are produced by rules in the grammar. However, T_INTLIT is a *terminal* symbol as it is not defined by any rule. Instead, it is an already-recognised token in the language. Similarly, the four maths operator tokens are terminal symbols. ## Recursive Descent Parsing Given that the grammar for our language is recursive, it makes sense for us to try and parse it recursively. What we need to do is to read in a token, then *look ahead* to the next token. Based on what the next token is, we can then decide what path we need to take to parse the input. This may require us to recursively call a function that has already been called. In our case, the first token in any expression will be a number and this may be followed by maths operator. After that there may only be a single number, or there may be the start of a whole new expression. How can we parse this recursively? We can write pseudo-code that looks like this: ``` function expression() { Scan and check the first token is a number. Error if it's not Get the next token If we have reached the end of the input, return, i.e. base case Otherwise, call expression() } ``` Let's run this function on the input `2 + 3 - 5 T_EOF` where `T_EOF` is a token that reflects the end of the input. I will number each call to `expression()`. ``` expression0: Scan in the 2, it's a number Get next token, +, which isn't T_EOF Call expression() expression1: Scan in the 3, it's a number Get next token, -, which isn't T_EOF Call expression() expression2: Scan in the 5, it's a number Get next token, T_EOF, so return from expression2 return from expression1 return from expression0 ``` Yes, the function was able to recursively parse the input `2 + 3 - 5 T_EOF`. Of course, we haven't done anything with the input, but that isn't the job of the parser. The parser's job is to *recognise* the input, and warn of any syntax errors. Someone else is going to do the *semantic analysis* of the input, i.e. to understand and perform the meaning of this input. > Later on, you will see that this isn't actually true. It often makes sense to intertwine the syntax analysis and semantic analysis. ## Abstract Syntax Trees To do the semantic analysis, we need code that either interprets the recognised input, or translates it to another format, e.g. assembly code. In this part of the journey, we will build an interpreter for the input. But to get there, we are first going to convert the input into an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree), also known as an AST. I highly recommend you read this short explanation of ASTs: + [Leveling Up One’s Parsing Game With ASTs](https://medium.com/basecs/leveling-up-ones-parsing-game-with-asts-d7a6fc2400ff) by Vaidehi Joshi It's well written and really help to explain the purpose and structure of ASTs. Don't worry, I'll be here when you get back. The structure of each node in the AST that we will build is described in `defs.h`: ```c // AST node types enum { A_ADD, A_SUBTRACT, A_MULTIPLY, A_DIVIDE, A_INTLIT }; // Abstract Syntax Tree structure struct ASTnode { int op; // "Operation" to be performed on this tree struct ASTnode *left; // Left and right child trees struct ASTnode *right; int intvalue; // For A_INTLIT, the integer value }; ``` Some AST nodes, like those with `op` values `A_ADD` and `A_SUBTRACT` have two child ASTs that are pointed to by `left` and `right`. Later on, we will add or subtract the values of the sub-trees. Alternatively, an AST node with the `op` value A_INTLIT represents an integer value. It has no sub-tree children, just a value in the `intvalue` field. ## Building AST Nodes and Trees The code in `tree.c` has the functions to build ASTs. The most general function, `mkastnode()`, takes values for all four fields in an AST node. It allocates the node, populates the field values and returns a pointer to the node: ```c // Build and return a generic AST node struct ASTnode *mkastnode(int op, struct ASTnode *left, struct ASTnode *right, int intvalue) { struct ASTnode *n; // Malloc a new ASTnode n = (struct ASTnode *) malloc(sizeof(struct ASTnode)); if (n == NULL) { fprintf(stderr, "Unable to malloc in mkastnode()\n"); exit(1); } // Copy in the field values and return it n->op = op; n->left = left; n->right = right; n->intvalue = intvalue; return (n); } ``` Given this, we can write more specific functions that make a leaf AST node (i.e. one with no children), and make an AST node with a single child: ```c // Make an AST leaf node struct ASTnode *mkastleaf(int op, int intvalue) { return (mkastnode(op, NULL, NULL, intvalue)); } // Make a unary AST node: only one child struct ASTnode *mkastunary(int op, struct ASTnode *left, int intvalue) { return (mkastnode(op, left, NULL, intvalue)); ``` ## Purpose of the AST We are going to use an AST to store each expression that we recognise so that, later on, we can traverse it recursively to calculate the final value of the expression. We do want to deal with the precedence of the maths operators. Here is an example. Consider the expression `2 * 3 + 4 * 5`. Now, multiplication has higher precedence that addition. Therefore, we want to *bind* the multiplication operands together and perform these operations before we do the addition. If we generated the AST tree to look like this: ``` + / \ / \ / \ * * / \ / \ 2 3 4 5 ``` then, when traversing the tree, we would perform `2*3` first, then `4*5`. Once we have these results, we can then pass them up to the root of the tree to perform the addition. ## A Naive Expression Parser Now, we could re-use the token values from our scanner as the AST node operation values, but I like to keep the concept