使用Python的编译器设计 SLR(1) 解析器
先决条件: LR解析器,SLR解析器
单反(一)文法
SLR 代表简单 LR 语法。这是一个自底向上解析器的例子。 SLR中的“L”代表从左到右进行的扫描,“R”代表逆序推导的构造,“(1)”代表前瞻输入符号的数量。
转移和减少操作
下一个重要概念是 SHIFT 和 REDUCE 操作。在解析过程中,我们可以进入两种状态。首先,我们可能有一个“·”在生产结束的状态。这种状态称为“Handle”。 REDUCE 操作的作用就在这里。其次,在解析时,我们可能会出现“·”指向某个语法符号的状态,并且仍有改进的余地。在这种情况下,将执行 Shift 操作。
使用点 [·]
在 LR 解析中,我们使用点“·”作为规则中的字符,以便我们知道在任何给定点的解析进度。请注意,“·”就像一个告密者,不应将其视为语法符号。在 SLR (1) 中,我们只有一个输入符号作为前瞻,这最终有助于我们在解析活动期间确定当前状态。
解析决策
为了做出解析决策,我们需要构造确定性有限自动机。它帮助我们确定我们在解析时到达的当前状态。当我们为 SLR 解析器设计它时,它被称为 LR 自动机。该自动机的状态是与生产规则集相关的“项目”集。
生成 LR 集合
第一步是创建增强语法。扩充过程首先将开始符号放在产生式规则的右侧。我们无法更改现有规则,因此我们在产品中添加新规则。将开始符号放在 RHS 上可确保解析达到接受状态。对这个新添加的规则的 REDUCE 操作决定了字符串的接受。
For example,
IF we have ‘K’ as a start symbol
THEN L → K is added in productions
(where ‘L’ represents any non-preexisting symbol in the collection)
CLOSURE 和 GOTO 操作
在确定性有限自动机的构造过程中,有两个要求。第一个是创建“状态”,第二个是在自动机中开发“转换”。
1) 关闭
闭包操作帮助我们形成“状态”。在进行关闭操作之前,必须将所有规则分开。然后给所有规则编号。这将有助于稍后在解析表中创建 Shift 和 Reduce 条目。令 I0 为文法扩充后得到的规则集合。这意味着我们在集合 I0 中还有一个新添加的规则。
Assumption – (consider [L, K] are non-terminals and [m, t, p] are set of zero or more terminals or non-terminals)
DO REPEAT (Till-No-New-Rule-Gets-Added) {
IF (any production of the form “ L → m · K t ” exists) and (we have production K → · p)
THEN {add production K → · p to the Closure set if not preexisting}
}
2) 转到
GOTO 操作帮助我们形成“转换”。在 GOTO(I,X) 操作中,'I' 可以详细描述为我们正在查看的状态,'X' 是点 (·) 所指向的符号。因此,GOTO 接受带有项目和语法符号的状态,并产生新的或现有的状态作为输出。
The GOTO (I, X) represents the state transition from “I” on the input symbol “X”.
For any production rule “ L → m · K t ” in “I”
GOTO (I, X) outputs the closure of a set of all the productions “ L → m K · t ”
计划方法
该程序的输入是具有一组项目的规则列表,以及确定终端和非终端符号的列表。控件从 GrammarAugmentation()函数开始,然后我们必须计算 I0 状态,即通过调用 findClosure() 计算,现在为新状态生成调用 generateStates()函数,最后调用 createParseTable()函数。
A) 语法增强
- 在这个函数中,我们首先创建一个唯一的符号并使用它来创建一个新项目以将开始符号带到 RHS 上。然后我们将这些项目格式化为一个嵌套列表,并在项目的 RHS 开头添加一个点。此外,我们在一项中只保留一个推导。因此,我们生成了一个名为 separatorRulesList 的列表。
B) 找到闭包
- 此函数针对 I0 状态运行不同。对于 I0 状态,我们直接将来自扩充的新创建项附加到closureSet。在任何其他情况下,closureSet 会使用接收到的参数“input_state”进行初始化。
- 现在继续迭代,直到我们在closureSet 中接收到新项目。我们遵循上面“项目集闭包”标题中提到的规则来寻找下一个要添加到闭包集中的项目。
C) 生成状态
- 在这个函数中,我们从 GOTO 计算开始。 “statesDict”是一个存储所有状态的字典,用作全局变量。我们遍历 statesDict 直到新的状态被添加到它。我们只对每个添加的状态调用一次 compute_GOTO()函数。
D) 计算_GOTO
- 现在控制到达了compute_GOTO,这个函数没有实际的goto逻辑,但是它创建了元数据来迭代地调用GOTO()函数。对于给定的输入状态,我们调用 GOTO(state,Xi),其中 Xi 表示由点指向的符号。可以有多个 Xi,因此将迭代过程与实际的 GOTO() 逻辑实现隔离可降低复杂性。
E) 转到
- 该函数由compute_GOTO()函数,我们接收“input_state”和“charNextToDot”,我们执行移位操作并生成一个新状态。
- 现在对于处于新状态的任何项目,点可以指向我们可能有另一个项目的某个符号,因此要包含新状态的所有项目,我们调用我上面讨论过的 findClosure()函数。
- 为了存储“GOTO(state, symbol) = newState”的信息,stateMap 字典被创建为具有元组类型的键和整数类型的值。这将有助于解析表生成和打印输出。
F) 创建解析表
- 首先,我们在初始空状态下创建表。列将有 ACTION(终端和 $)和 GOTO(非终端)。行将具有编号状态 (I0-In)。
- 使用 stateMap 填写 SHIFT 和 GOTO 条目。要在 LR 解析器中添加 reduce 条目,请在生成的状态中找到“句柄”(以点结尾的项目),并将 Rn(n 是分离规则列表中的项目编号)放在 Table[ stateNo ] [ Ai ] 中,其中“stateNo”是项目所属的状态。
- “Ai”是属于我们正在遍历的当前项目的 LHS 符号的 FOLLOW 的任何符号。新的扩充规则的 REDUCE 条目显示“接受”。解析器可能存在 RR (Reduce-Reduce) 和 SR (Shift-Reduce) 冲突。
Python3
# SLR(1)
import copy
# perform grammar augmentation
def grammarAugmentation(rules, nonterm_userdef,
start_symbol):
# newRules stores processed output rules
newRules = []
# create unique 'symbol' to
# - represent new start symbol
newChar = start_symbol + "'"
while (newChar in nonterm_userdef):
newChar += "'"
# adding rule to bring start symbol to RHS
newRules.append([newChar,
['.', start_symbol]])
# new format => [LHS,[.RHS]],
# can't use dictionary since
# - duplicate keys can be there
for rule in rules:
# split LHS from RHS
k = rule.split("->")
lhs = k[0].strip()
rhs = k[1].strip()
# split all rule at '|'
# keep single derivation in one rule
multirhs = rhs.split('|')
for rhs1 in multirhs:
rhs1 = rhs1.strip().split()
# ADD dot pointer at start of RHS
rhs1.insert(0, '.')
newRules.append([lhs, rhs1])
return newRules
# find closure
def findClosure(input_state, dotSymbol):
global start_symbol, \
separatedRulesList, \
statesDict
# closureSet stores processed output
closureSet = []
# if findClosure is called for
# - 1st time i.e. for I0,
# then LHS is received in "dotSymbol",
# add all rules starting with
# - LHS symbol to closureSet
if dotSymbol == start_symbol:
for rule in separatedRulesList:
if rule[0] == dotSymbol:
closureSet.append(rule)
else:
# for any higher state than I0,
# set initial state as
# - received input_state
closureSet = input_state
# iterate till new states are
# - getting added in closureSet
prevLen = -1
while prevLen != len(closureSet):
prevLen = len(closureSet)
# "tempClosureSet" - used to eliminate
# concurrent modification error
tempClosureSet = []
# if dot pointing at new symbol,
# add corresponding rules to tempClosure
for rule in closureSet:
indexOfDot = rule[1].index('.')
if rule[1][-1] != '.':
dotPointsHere = rule[1][indexOfDot + 1]
for in_rule in separatedRulesList:
if dotPointsHere == in_rule[0] and \
in_rule not in tempClosureSet:
tempClosureSet.append(in_rule)
# add new closure rules to closureSet
for rule in tempClosureSet:
if rule not in closureSet:
closureSet.append(rule)
return closureSet
def compute_GOTO(state):
global statesDict, stateCount
# find all symbols on which we need to
# make function call - GOTO
generateStatesFor = []
for rule in statesDict[state]:
# if rule is not "Handle"
if rule[1][-1] != '.':
indexOfDot = rule[1].index('.')
dotPointsHere = rule[1][indexOfDot + 1]
if dotPointsHere not in generateStatesFor:
generateStatesFor.append(dotPointsHere)
# call GOTO iteratively on all symbols pointed by dot
if len(generateStatesFor) != 0:
for symbol in generateStatesFor:
GOTO(state, symbol)
return
def GOTO(state, charNextToDot):
global statesDict, stateCount, stateMap
# newState - stores processed new state
newState = []
for rule in statesDict[state]:
indexOfDot = rule[1].index('.')
if rule[1][-1] != '.':
if rule[1][indexOfDot + 1] == \
charNextToDot:
# swapping element with dot,
# to perform shift operation
shiftedRule = copy.deepcopy(rule)
shiftedRule[1][indexOfDot] = \
shiftedRule[1][indexOfDot + 1]
shiftedRule[1][indexOfDot + 1] = '.'
newState.append(shiftedRule)
# add closure rules for newState
# call findClosure function iteratively
# - on all existing rules in newState
# addClosureRules - is used to store
# new rules temporarily,
# to prevent concurrent modification error
addClosureRules = []
for rule in newState:
indexDot = rule[1].index('.')
# check that rule is not "Handle"
if rule[1][-1] != '.':
closureRes = \
findClosure(newState, rule[1][indexDot + 1])
for rule in closureRes:
if rule not in addClosureRules \
and rule not in newState:
addClosureRules.append(rule)
# add closure result to newState
for rule in addClosureRules:
newState.append(rule)
# find if newState already present
# in Dictionary
stateExists = -1
for state_num in statesDict:
if statesDict[state_num] == newState:
stateExists = state_num
break
# stateMap is a mapping of GOTO with
# its output states
if stateExists == -1:
# if newState is not in dictionary,
# then create new state
stateCount += 1
statesDict[stateCount] = newState
stateMap[(state, charNextToDot)] = stateCount
else:
# if state repetition found,
# assign that previous state number
stateMap[(state, charNextToDot)] = stateExists
return
def generateStates(statesDict):
prev_len = -1
called_GOTO_on = []
# run loop till new states are getting added
while (len(statesDict) != prev_len):
prev_len = len(statesDict)
keys = list(statesDict.keys())
# make compute_GOTO function call
# on all states in dictionary
for key in keys:
if key not in called_GOTO_on:
called_GOTO_on.append(key)
compute_GOTO(key)
return
# calculation of first
# epsilon is denoted by '#' (semi-colon)
# pass rule in first function
def first(rule):
global rules, nonterm_userdef, \
term_userdef, diction, firsts
# recursion base condition
# (for terminal or epsilon)
if len(rule) != 0 and (rule is not None):
if rule[0] in term_userdef:
return rule[0]
elif rule[0] == '#':
return '#'
# condition for Non-Terminals
if len(rule) != 0:
if rule[0] in list(diction.keys()):
# fres temporary list of result
fres = []
rhs_rules = diction[rule[0]]
# call first on each rule of RHS
# fetched (& take union)
for itr in rhs_rules:
indivRes = first(itr)
if type(indivRes) is list:
for i in indivRes:
fres.append(i)
else:
fres.append(indivRes)
# if no epsilon in result
# - received return fres
if '#' not in fres:
return fres
else:
# apply epsilon
# rule => f(ABC)=f(A)-{e} U f(BC)
newList = []
fres.remove('#')
if len(rule) > 1:
ansNew = first(rule[1:])
if ansNew != None:
if type(ansNew) is list:
newList = fres + ansNew
else:
newList = fres + [ansNew]
else:
newList = fres
return newList
# if result is not already returned
# - control reaches here
# lastly if eplison still persists
# - keep it in result of first
fres.append('#')
return fres
# calculation of follow
def follow(nt):
global start_symbol, rules, nonterm_userdef, \
term_userdef, diction, firsts, follows
# for start symbol return $ (recursion base case)
solset = set()
if nt == start_symbol:
# return '$'
solset.add('$')
# check all occurrences
# solset - is result of computed 'follow' so far
# For input, check in all rules
for curNT in diction:
rhs = diction[curNT]
# go for all productions of NT
for subrule in rhs:
if nt in subrule:
# call for all occurrences on
# - non-terminal in subrule
while nt in subrule:
index_nt = subrule.index(nt)
subrule = subrule[index_nt + 1:]
# empty condition - call follow on LHS
if len(subrule) != 0:
# compute first if symbols on
# - RHS of target Non-Terminal exists
res = first(subrule)
# if epsilon in result apply rule
# - (A->aBX)- follow of -
# - follow(B)=(first(X)-{ep}) U follow(A)
if '#' in res:
newList = []
res.remove('#')
ansNew = follow(curNT)
if ansNew != None:
if type(ansNew) is list:
newList = res + ansNew
else:
newList = res + [ansNew]
else:
newList = res
res = newList
else:
# when nothing in RHS, go circular
# - and take follow of LHS
# only if (NT in LHS)!=curNT
if nt != curNT:
res = follow(curNT)
# add follow result in set form
if res is not None:
if type(res) is list:
for g in res:
solset.add(g)
else:
solset.add(res)
return list(solset)
def createParseTable(statesDict, stateMap, T, NT):
global separatedRulesList, diction
# create rows and cols
rows = list(statesDict.keys())
cols = T+['$']+NT
# create empty table
Table = []
tempRow = []
for y in range(len(cols)):
tempRow.append('')
for x in range(len(rows)):
Table.append(copy.deepcopy(tempRow))
# make shift and GOTO entries in table
for entry in stateMap:
state = entry[0]
symbol = entry[1]
# get index
a = rows.index(state)
b = cols.index(symbol)
if symbol in NT:
Table[a][b] = Table[a][b]\
+ f"{stateMap[entry]} "
elif symbol in T:
Table[a][b] = Table[a][b]\
+ f"S{stateMap[entry]} "
# start REDUCE procedure
# number the separated rules
numbered = {}
key_count = 0
for rule in separatedRulesList:
tempRule = copy.deepcopy(rule)
tempRule[1].remove('.')
numbered[key_count] = tempRule
key_count += 1
# start REDUCE procedure
# format for follow computation
addedR = f"{seperatedRulesList[0][0]} -> " \
f"{seperatedRulesList[0][1][1]}"
rules.insert(0, addedR)
for rule in rules:
k = rule.split("->")
# remove un-necessary spaces
k[0] = k[0].strip()
k[1] = k[1].strip()
rhs = k[1]
multirhs = rhs.split('|')
# remove un-necessary spaces
for i in range(len(multirhs)):
multirhs[i] = multirhs[i].strip()
multirhs[i] = multirhs[i].split()
diction[k[0]] = multirhs
# find 'handle' items and calculate follow.
for stateno in statesDict:
for rule in statesDict[stateno]:
if rule[1][-1] == '.':
# match the item
temp2 = copy.deepcopy(rule)
temp2[1].remove('.')
for key in numbered:
if numbered[key] == temp2:
# put Rn in those ACTION symbol columns,
# who are in the follow of
# LHS of current Item.
follow_result = follow(rule[0])
for col in follow_result:
index = cols.index(col)
if key == 0:
Table[stateno][index] = "Accept"
else:
Table[stateno][index] =\
Table[stateno][index]+f"R{key} "
# printing table
print("\nSLR(1) parsing table:\n")
frmt = "{:>8}" * len(cols)
print(" ", frmt.format(*cols), "\n")
ptr = 0
j = 0
for y in Table:
frmt1 = "{:>8}" * len(y)
print(f"{{:>3}} {frmt1.format(*y)}"
.format('I'+str(j)))
j += 1
def printResult(rules):
for rule in rules:
print(f"{rule[0]} ->"
f" {' '.join(rule[1])}")
def printAllGOTO(diction):
for itr in diction:
print(f"GOTO ( I{itr[0]} ,"
f" {itr[1]} ) = I{stateMap[itr]}")
# *** MAIN *** - Driver Code
# uncomment any rules set to test code
# follow given format to add -
# user defined grammar rule set
# rules section - *START*
# example sample set 01
rules = ["E -> E + T | T",
"T -> T * F | F",
"F -> ( E ) | id"
]
nonterm_userdef = ['E', 'T', 'F']
term_userdef = ['id', '+', '*', '(', ')']
start_symbol = nonterm_userdef[0]
# example sample set 02
# rules = ["S -> a X d | b Y d | a Y e | b X e",
# "X -> c",
# "Y -> c"
# ]
# nonterm_userdef = ['S','X','Y']
# term_userdef = ['a','b','c','d','e']
# start_symbol = nonterm_userdef[0]
# rules section - *END*
print("\nOriginal grammar input:\n")
for y in rules:
print(y)
# print processed rules
print("\nGrammar after Augmentation: \n")
seperatedRulesList = \
grammarAugmentation(rules,
nonterm_userdef,
start_symbol)
printResult(seperatedRulesList)
# find closure
start_symbol = seperatedRulesList[0][0]
print("\nCalculated closure: I0\n")
I0 = findClosure(0, start_symbol)
printResult(I0)
# use statesDict to store the states
# use stateMap to store GOTOs
statesDict = {}
stateMap = {}
# add first state to statesDict
# and maintain stateCount
# - for newState generation
statesDict[0] = I0
stateCount = 0
# computing states by GOTO
generateStates(statesDict)
# print goto states
print("\nStates Generated: \n")
for st in statesDict:
print(f"State = I{st}")
printResult(statesDict[st])
print()
print("Result of GOTO computation:\n")
printAllGOTO(stateMap)
# "follow computation" for making REDUCE entries
diction = {}
# call createParseTable function
createParseTable(statesDict, stateMap,
term_userdef,
nonterm_userdef)
输出:
Original grammar input:
E -> E + T | T
T -> T * F | F
F -> ( E ) | id
Grammar after Augmentation:
E' -> . E
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id
Calculated closure: I0
E' -> . E
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id
States Generated:
State = I0
E' -> . E
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id
State = I1
E' -> E .
E -> E . + T
State = I2
E -> T .
T -> T . * F
State = I3
T -> F .
State = I4
F -> ( . E )
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id
State = I5
F -> id .
State = I6
E -> E + . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id
State = I7
T -> T * . F
F -> . ( E )
F -> . id
State = I8
F -> ( E . )
E -> E . + T
State = I9
E -> E + T .
T -> T . * F
State = I10
T -> T * F .
State = I11
F -> ( E ) .
Result of GOTO computation:
GOTO ( I0 , E ) = I1
GOTO ( I0 , T ) = I2
GOTO ( I0 , F ) = I3
GOTO ( I0 , ( ) = I4
GOTO ( I0 , id ) = I5
GOTO ( I1 , + ) = I6
GOTO ( I2 , * ) = I7
GOTO ( I4 , E ) = I8
GOTO ( I4 , T ) = I2
GOTO ( I4 , F ) = I3
GOTO ( I4 , ( ) = I4
GOTO ( I4 , id ) = I5
GOTO ( I6 , T ) = I9
GOTO ( I6 , F ) = I3
GOTO ( I6 , ( ) = I4
GOTO ( I6 , id ) = I5
GOTO ( I7 , F ) = I10
GOTO ( I7 , ( ) = I4
GOTO ( I7 , id ) = I5
GOTO ( I8 , ) ) = I11
GOTO ( I8 , + ) = I6
GOTO ( I9 , * ) = I7
SLR(1) parsing table:
id + * ( ) $ E T F
I0 S5 S4 1 2 3
I1 S6 Accept
I2 R2 S7 R2 R2
I3 R4 R4 R4 R4
I4 S5 S4 8 2 3
I5 R6 R6 R6 R6
I6 S5 S4 9 3
I7 S5 S4 10
I8 S6 S11
I9 R1 S7 R1 R1
I10 R3 R3 R3 R3
I11 R5 R5 R5 R5