程序计算给定语法的第一套和第二套(1)

📌 相关文章

📜 程序计算给定语法的第一套和第二套(1)

📅 最后修改于: 2023-12-03 15:41:05.957000 🧑 作者: Mango

给定语法的第一套和第二套计算程序

在编译原理中，给定一种语法，通常需要通过计算其第一套和第二套来构建语法分析器。本文将介绍如何编写一个程序来计算给定语法的第一套和第二套。

第一套

定义

第一套指的是文法G的每一个非终结符A能推出的首符集合，记作FIRST(A)。其中，首符指的是一个产生式右部可能的第一个符号。

程序实现

我们可以通过递归求解式子的方式来计算每个非终结符的FIRST集合。

def compute_first(grammar, non_terminals):
    # 初始化FIRST集合
    first = {}
    for nt in non_terminals:
        first[nt] = set()

    while True:
        # 标记是否FIRST集合有改变
        changed = False
        for nt in non_terminals:
            for production in grammar[nt]:
                # 如果产生式右部第一个符号是终结符，则将其加入FIRST集合
                if production[0] not in non_terminals:
                    if production[0] not in first[nt]:
                        changed = True
                        first[nt].add(production[0])
                # 否则，将该产生式右部的非终结符的FIRST集合加入FIRST集合
                else:
                    index = 0
                    add_epsilon = True
                    while index < len(production) and add_epsilon:
                        if production[index] in non_terminals:
                            add_epsilon = False
                            for symbol in first[production[index]]:
                                if symbol != "epsilon" and symbol not in first[nt]:
                                    first[nt].add(symbol)
                                    changed = True
                            if "epsilon" not in first[production[index]]:
                                add_epsilon = False
                        else:
                            if production[index] not in first[nt]:
                                first[nt].add(production[index])
                                changed = True
                            add_epsilon = False
                        index += 1
                    if add_epsilon:
                        if "epsilon" not in first[nt]:
                            first[nt].add("epsilon")
                            changed = True
        # 如果FIRST集合没有改变，则计算结束
        if not changed:
            break
    return first

示例

假设文法G如下：

E -> T E'
E' -> + T E' | epsilon
T -> F T'
T' -> * F T' | epsilon
F -> ( E ) | id

其中，非终结符为E、E'、T、T'、F，终结符为+、*、(、)、id、epsilon。

我们可以通过调用compute_first函数，计算每个非终结符的FIRST集合：

grammar = {
    "E": [["T", "E'"]],
    "E'": [["+", "T", "E'"], ["epsilon"]],
    "T": [["F", "T'"]],
    "T'": [["*", "F", "T'"], ["epsilon"]],
    "F": [["(", "E", ")"], ["id"]]
}
non_terminals = ["E", "E'", "T", "T'", "F"]
first_set = compute_first(grammar, non_terminals)

print(first_set)

输出结果为：

{
    'E': {'(', 'id'},
    "E'": {'+', 'epsilon'},
    'T': {'(', 'id'},
    "T'": {'*', 'epsilon'},
    'F': {'(', 'id'}
}

第二套

定义

第二套指的是文法G的每一个非终结符A所能推出的所有产生式右部的符号串的FIRST集合的并集，即FOLLOW(A)。其中，产生式右部的FIRST集合的并集为从该符号串开始可能的第一个终结符，以及可能的epsilon。

程序实现

同样地，我们可以通过递归求解式子的方式来计算每个非终结符的FOLLOW集合。

def compute_follow(grammar, start_symbol, non_terminals, first_set):
    # 初始化FOLLOW集合
    follow = {}
    for nt in non_terminals:
        follow[nt] = set()
    follow[start_symbol] = {"$"}


    while True:
        # 标记FOLLOW集合是否有改变
        changed = False
        for nt in non_terminals:
            for production in grammar[nt]:
                for i in range(len(production)):
                    symbol = production[i]
                    if symbol in non_terminals:
                        # 如果该非终结符不在产生式最后一个符号，则将其之后的符号加入其FOLLOW集合，即FIRST(其后一个符号)
                        if i < len(production) - 1:
                            next_symbol = production[i + 1]
                            if next_symbol in non_terminals:
                                for s in first_set[next_symbol]:
                                    if s != "epsilon" and s not in follow[symbol]:
                                        follow[symbol].add(s)
                                        changed = True
                                if "epsilon" in first_set[next_symbol]:
                                    for s in follow[nt]:
                                        if s not in follow[symbol]:
                                            follow[symbol].add(s)
                                            changed = True
                            else:
                                if next_symbol not in follow[symbol]:
                                    follow[symbol].add(next_symbol)
                                    changed = True
                        # 否则，将该产生式所在非终结符的FOLLOW集合加入它的FOLLOW集合
                        elif nt != symbol:
                            for s in follow[nt]:
                                if s not in follow[symbol]:
                                    follow[symbol].add(s)
                                    changed = True
        # 如果FOLLOW集合没有改变，则计算结束
        if not changed:
            break
    return follow

示例

假设文法G如下：

S -> A a
S -> b B
A -> c A
A -> epsilon
B -> d B
B -> epsilon

其中，非终结符为S、A、B，终结符为a、b、c、d、epsilon。

我们可以通过调用compute_follow函数，计算每个非终结符的FOLLOW集合：

grammar = {
    "S": [["A", "a"], ["b", "B"]],
    "A": [["c", "A"], ["epsilon"]],
    "B": [["d", "B"], ["epsilon"]]
}
non_terminals = ["S", "A", "B"]
start_symbol = "S"
first_set = compute_first(grammar, non_terminals)
follow_set = compute_follow(grammar, start_symbol, non_terminals, first_set)

print(follow_set)

输出结果为：

{
    'S': {'$', 'd'},
    'A': {'a', 'b', 'd'},
    'B': {'a', 'b', 'd'}
}

结论

通过以上程序的实现，我们可以方便地计算给定语法的第一套和第二套。在实际应用中，这些集合有助于构建语法分析器，帮助解析程序语言。