自制编译器(三)

语法分析

  在本篇文章中,我们将采用javacc来实现编译器的语法分析功能。语法中一定会有表示“需要解析的对象整体”的符号。在Cb中,编译的单位,即“单个的文件”是需要分析的对象,所以需要用相应的语法规则对其进行表示。

  Cb中表示1个文件整体的非终端符号被成为compilation_unit,它的规则如下所示:

1
2
3
4
compilation_unit():{}
{
import_stmts() top_defs() <EOF>
}

语法的单位

  一般编程语言的语法单位有下面这些:

  • 定义:指变量定义、函数定义或类定义等
  • 声明
  • 语句:函数或方法的定义的本体中包含有语句
  • 表达式: 表达式是比语句小、具有值的语法单位
  • :项这一语法单位是表达式中构成二元运算的一方,也就是仅由一元运算符构成的语法。

  接下来我们将详细列出各种定义,声明,语句,表达式及项的对应的javacc描述。

import声明的语法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import_stmts():{}
{
(import_stmt())*
}

import_stmt():{}
{
<IMPORT> name()("."name())*";"
}

name():{}
{
<IDENTIFIER>
}

各类定义的语法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
top_defs():{}
{
(LOOKAHEAD(storage() typeref() <IDENTIFIER>"(")
defun()
|LOOKAHEAD(3)
defvars()
| defconst()
| defstruct()
| defunion()
| typedef()
)*
}

defvars():{}
{
storage() type() name() ["=" expr()] ("," name()["=" expr()])*";"
}

storage():{}
{
[<STATIC>]
}

defun):{}
{
storage() typedef() name() "(" params() ")" block()
}

params():{}
{
LOOKAHEAD(<VOID> ")") <VOID>
| fixedparams() ["," "..."]
}

fixedparams():{}
{
param() (LOOKAHEAD(2) "," param())*
}

block():{}
{
"{" defvar_list() stmts() "}"
}

defstruct():{}
{
<STRUCT> name() member_list() ";"
}

defunion():{}
{
<UNION> name() member_list() ";"
}

member_list():{}
{
"{" (slot() ";")* "}"
}

slot():{}
{
type() name()
}

typedef():{}
{
<TYPEDEF> typeref() <IDENTIFIER> ";"
}

type():{}
{
typeref()
}

typeref():{}
{
typeref_base()
(LOOKAHEAD(2) "[" "]"
| "[" <INTEGER> "]"
| "*"
| "(" param_typerefs() ")"
)*
}

type_base():{}
{
<VOID>
| <CHAR>
| <SHORT>
| <INT>
| <LONG>
| LOOKAHEAD(2) <UNSIGNED> <CHAR>
| LOOKAHEAD(2) <UNSIGNED> <SHORT>
| LOOKAHEAD(2) <UNSIGNED> <INT>
| <UNSIGNED> <LONG>
| <STRUCT> <IDENTIFIER>
| <UNION> <IDENTIFIER>
| LOOKAHEAD({isType(getToken(1).image)}) <IDENTIFIER>
}

语句的定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
stmts():{}
{
(stmt())*
}

stmt():{}
{
(
";"
| LOOKAHEAD(2) labeled_stmt()
| expr() ";"
| block()
| if_stmt()
| while_stmt()
| for_stmt()
| dowhile_stmt()
| switch_stmt()
| break_stmt()
| continue_stmt()
| goto_stmt()
| return_stmt()
)
}

labeled_stmt():{}
{
<IDENTIFIER>":"stmt()
}

if_stmt():{}
{
<IF> "(" expr() ")" stmt() [LOOKAHEAD(1) <ELSE> stms()]
}

while_stmt():{}
{
<WHILE> "(" expr() ")" stmt()
}

for_stmt():{}
{
<FOR> "(" [expr()] ";" [expr()] ";" [expr()] ";" ")" stmt()
}

dowhile_stmt():{}{
<DO> stmt() <WHILE>"(" expr() ")" ";"
}

switch_stmt():{}
{
<SWITCH> "(" expr() ")" "{" (<CASE>":" stmt())* [<DEFAULT>":" stmt()]"}"
}

break_stmt():{}
{
<BREAK> ";"
}

continue_stmt():{}
{
<CONTINUE> ";"
}

goto_stmt():{}
{
<GOTO><IDENTIFIER>
}

return_stmt():{}
{
LOOKAHEAD(2) <RETURN> ";"
| <RETURN> expr()";"
}

表达式分析

  表达式的结构是有层次的,原因在于表达式中所使用的运算符存在优先级。优先级底的运算符靠近根节点,优先级高的运算符位于下层表达式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
expr():{}
{
LOOKAHEAD(term() "=")
term() "=" expr()
| LOOKAHEAD(term() opassign_op())
term() opassign_op() expr()
| expr10()

}

opassign_op():{}
{
(
"+="
| "-="
| "*="
| "/="
| "%="
| "&="
| "|="
| "^="
| "<<="
| ">>="
)
}

expr10():{}
{
expr9() ["?" expr() ":" expr10() ]
}

expr9():{}
{
expr8()("||"expr8())*
}

expr8():{}
{
expr7()("&&"expr7())*
}
expr7():{}
{
expr6()(
">" expr6()
|"<" expr6()
| ">=" expr6()
| "<=" expr6()
| "==" expr6()
| "!=" expr6()
)*
}

expr6():{}
{
expr5()("|" expr5())*
}

expr4():{}
{
expr3()("&"expr3())*
}

expr3():{}
{
expr2()(
">>"expr2()
| "<<" expr2()
)*
}
expr2():{}
{
expr1()(
"+"expr1()
|"-"expr1()
)*
}
expr1():{}
{
term()(
"*"term()
|"/"term()
|"+"term()
|"-"term()
)*
}

项的分析

  

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
    term():{}
{
LOOKAHEAD("("type()) "(" type() ")" term()
|unary()
}

//前置运算符规则
unary():{}
{
"++" unary()
|"--" unary()
|"+" term()
|"-" term()
|"!" term()
|"~" term()
|"*" term()
|"&" term()
|LOOKAHEAD(3)<SIZEOF>"(" type() ")"
| <SIZEOF>unary()
|postfix()
}

//后置运算符guize
postfix():{}
{
primary()
(
"++"
|"--"
|"[" expr() "]"
|"."name()
|"->"name()
|"(" args() ")"
)*
}

args():{}
{
[expr()("."expr())*]
}

primary():{}
{
<INTEGER>
|<CHARACTER>
|<STRING>
|<IDENTIFIER>
|"("expr()")"
}