📜  使用Python的编译器设计 SLR(1) 解析器

📅  最后修改于: 2022-05-13 01:55:51.895000             🧑  作者: Mango

使用Python的编译器设计 SLR(1) 解析器

先决条件: LR解析器,SLR解析器

单反(一)文法

SLR 代表简单 LR 语法。这是一个自底向上解析器的例子。 SLR中的“L”代表从左到右进行的扫描,“R”代表逆序推导的构造,“(1)”代表前瞻输入符号的数量。

转移和减少操作

下一个重要概念是 SHIFT 和 REDUCE 操作。在解析过程中,我们可以进入两种状态。首先,我们可能有一个“·”在生产结束的状态。这种状态称为“Handle”。 REDUCE 操作的作用就在这里。其次,在解析时,我们可能会出现“·”指向某个语法符号的状态,并且仍有改进的余地。在这种情况下,将执行 Shift 操作。

使用点 [·]

在 LR 解析中,我们使用点“·”作为规则中的字符,以便我们知道在任何给定点的解析进度。请注意,“·”就像一个告密者,不应将其视为语法符号。在 SLR (1) 中,我们只有一个输入符号作为前瞻,这最终有助于我们在解析活动期间确定当前状态。

解析决策

为了做出解析决策,我们需要构造确定性有限自动机。它帮助我们确定我们在解析时到达的当前状态。当我们为 SLR 解析器设计它时,它被称为 LR 自动机。该自动机的状态是与生产规则集相关的“项目”集。

生成 LR 集合

第一步是创建增强语法。扩充过程首先将开始符号放在产生式规则的右侧。我们无法更改现有规则,因此我们在产品中添加新规则。将开始符号放在 RHS 上可确保解析达到接受状态。对这个新添加的规则的 REDUCE 操作决定了字符串的接受。

CLOSURE 和 GOTO 操作

在确定性有限自动机的构造过程中,有两个要求。第一个是创建“状态”,第二个是在自动机中开发“转换”。

1) 关闭

闭包操作帮助我们形成“状态”。在进行关闭操作之前,必须将所有规则分开。然后给所有规则编号。这将有助于稍后在解析表中创建 Shift 和 Reduce 条目。令 I0 为文法扩充后得到的规则集合。这意味着我们在集合 I0 中还有一个新添加的规则。

2) 转到

GOTO 操作帮助我们形成“转换”。在 GOTO(I,X) 操作中,'I' 可以详细描述为我们正在查看的状态,'X' 是点 (·) 所指向的符号。因此,GOTO 接受带有项目和语法符号的状态,并产生新的或现有的状态作为输出。

计划方法

该程序的输入是具有一组项目的规则列表,以及确定终端和非终端符号的列表。控件从 GrammarAugmentation()函数开始,然后我们必须计算 I0 状态,即通过调用 findClosure() 计算,现在为新状态生成调用 generateStates()函数,最后调用 createParseTable()函数。

A) 语法增强

  • 在这个函数中,我们首先创建一个唯一的符号并使用它来创建一个新项目以将开始符号带到 RHS 上。然后我们将这些项目格式化为一个嵌套列表,并在项目的 RHS 开头添加一个点。此外,我们在一项中只保留一个推导。因此,我们生成了一个名为 separatorRulesList 的列表。

B) 找到闭包

  • 此函数针对 I0 状态运行不同。对于 I0 状态,我们直接将来自扩充的新创建项附加到closureSet。在任何其他情况下,closureSet 会使用接收到的参数“input_state”进行初始化。
  • 现在继续迭代,直到我们在closureSet 中接收到新项目。我们遵循上面“项目集闭包”标题中提到的规则来寻找下一个要添加到闭包集中的项目。

C) 生成状态

  • 在这个函数中,我们从 GOTO 计算开始。 “statesDict”是一个存储所有状态的字典,用作全局变量。我们遍历 statesDict 直到新的状态被添加到它。我们只对每个添加的状态调用一次 compute_GOTO()函数。

D) 计算_GOTO

  • 现在控制到达了compute_GOTO,这个函数没有实际的goto逻辑,但是它创建了元数据来迭代地调用GOTO()函数。对于给定的输入状态,我们调用 GOTO(state,Xi),其中 Xi 表示由点指向的符号。可以有多个 Xi,因此将迭代过程与实际的 GOTO() 逻辑实现隔离可降低复杂性。

E) 转到

  • 该函数由compute_GOTO()函数,我们接收“input_state”和“charNextToDot”,我们执行移位操作并生成一个新状态。
  • 现在对于处于新状态的任何项目,点可以指向我们可能有另一个项目的某个符号,因此要包含新状态的所有项目,我们调用我上面讨论过的 findClosure()函数。
  • 为了存储“GOTO(state, symbol) = newState”的信息,stateMap 字典被创建为具有元组类型的键和整数类型的值。这将有助于解析表生成和打印输出。

F) 创建解析表

  • 首先,我们在初始空状态下创建表。列将有 ACTION(终端和 $)和 GOTO(非终端)。行将具有编号状态 (I0-In)。
  • 使用 stateMap 填写 SHIFT 和 GOTO 条目。要在 LR 解析器中添加 reduce 条目,请在生成的状态中找到“句柄”(以点结尾的项目),并将 Rn(n 是分离规则列表中的项目编号)放在 Table[ stateNo ] [ Ai ] 中,其中“stateNo”是项目所属的状态。
  • “Ai”是属于我们正在遍历的当前项目的 LHS 符号的 FOLLOW 的任何符号。新的扩充规则的 REDUCE 条目显示“接受”。解析器可能存在 RR (Reduce-Reduce) 和 SR (Shift-Reduce) 冲突。
Python3
# SLR(1)
 
import copy
 
# perform grammar augmentation
def grammarAugmentation(rules, nonterm_userdef,
                        start_symbol):
   
    # newRules stores processed output rules
    newRules = []
 
    # create unique 'symbol' to
    # - represent new start symbol
    newChar = start_symbol + "'"
    while (newChar in nonterm_userdef):
        newChar += "'"
 
    # adding rule to bring start symbol to RHS
    newRules.append([newChar,
                     ['.', start_symbol]])
 
    # new format  => [LHS,[.RHS]],
    # can't use dictionary since
    # - duplicate keys can be there
    for rule in rules:
       
        # split LHS from RHS
        k = rule.split("->")
        lhs = k[0].strip()
        rhs = k[1].strip()
         
        # split all rule at '|'
        # keep single derivation in one rule
        multirhs = rhs.split('|')
        for rhs1 in multirhs:
            rhs1 = rhs1.strip().split()
             
            # ADD dot pointer at start of RHS
            rhs1.insert(0, '.')
            newRules.append([lhs, rhs1])
    return newRules
 
 
# find closure
def findClosure(input_state, dotSymbol):
    global start_symbol, \
        separatedRulesList, \
        statesDict
 
    # closureSet stores processed output
    closureSet = []
 
    # if findClosure is called for
    # - 1st time i.e. for I0,
    # then LHS is received in "dotSymbol",
    # add all rules starting with
    # - LHS symbol to closureSet
    if dotSymbol == start_symbol:
        for rule in separatedRulesList:
            if rule[0] == dotSymbol:
                closureSet.append(rule)
    else:
        # for any higher state than I0,
        # set initial state as
        # - received input_state
        closureSet = input_state
 
    # iterate till new states are
    # - getting added in closureSet
    prevLen = -1
    while prevLen != len(closureSet):
        prevLen = len(closureSet)
 
        # "tempClosureSet" - used to eliminate
        # concurrent modification error
        tempClosureSet = []
 
        # if dot pointing at new symbol,
        # add corresponding rules to tempClosure
        for rule in closureSet:
            indexOfDot = rule[1].index('.')
            if rule[1][-1] != '.':
                dotPointsHere = rule[1][indexOfDot + 1]
                for in_rule in separatedRulesList:
                    if dotPointsHere == in_rule[0] and \
                            in_rule not in tempClosureSet:
                        tempClosureSet.append(in_rule)
 
        # add new closure rules to closureSet
        for rule in tempClosureSet:
            if rule not in closureSet:
                closureSet.append(rule)
    return closureSet
 
 
def compute_GOTO(state):
    global statesDict, stateCount
 
    # find all symbols on which we need to
    # make function call - GOTO
    generateStatesFor = []
    for rule in statesDict[state]:
        # if rule is not "Handle"
        if rule[1][-1] != '.':
            indexOfDot = rule[1].index('.')
            dotPointsHere = rule[1][indexOfDot + 1]
            if dotPointsHere not in generateStatesFor:
                generateStatesFor.append(dotPointsHere)
 
    # call GOTO iteratively on all symbols pointed by dot
    if len(generateStatesFor) != 0:
        for symbol in generateStatesFor:
            GOTO(state, symbol)
    return
 
 
def GOTO(state, charNextToDot):
    global statesDict, stateCount, stateMap
 
    # newState - stores processed new state
    newState = []
    for rule in statesDict[state]:
        indexOfDot = rule[1].index('.')
        if rule[1][-1] != '.':
            if rule[1][indexOfDot + 1] == \
                    charNextToDot:
                # swapping element with dot,
                # to perform shift operation
                shiftedRule = copy.deepcopy(rule)
                shiftedRule[1][indexOfDot] = \
                    shiftedRule[1][indexOfDot + 1]
                shiftedRule[1][indexOfDot + 1] = '.'
                newState.append(shiftedRule)
 
    # add closure rules for newState
    # call findClosure function iteratively
    # - on all existing rules in newState
 
    # addClosureRules - is used to store
    # new rules temporarily,
    # to prevent concurrent modification error
    addClosureRules = []
    for rule in newState:
        indexDot = rule[1].index('.')
        # check that rule is not "Handle"
        if rule[1][-1] != '.':
            closureRes = \
                findClosure(newState, rule[1][indexDot + 1])
            for rule in closureRes:
                if rule not in addClosureRules \
                        and rule not in newState:
                    addClosureRules.append(rule)
 
    # add closure result to newState
    for rule in addClosureRules:
        newState.append(rule)
 
    # find if newState already present
    # in Dictionary
    stateExists = -1
    for state_num in statesDict:
        if statesDict[state_num] == newState:
            stateExists = state_num
            break
 
    # stateMap is a mapping of GOTO with
    # its output states
    if stateExists == -1:
       
        # if newState is not in dictionary,
        # then create new state
        stateCount += 1
        statesDict[stateCount] = newState
        stateMap[(state, charNextToDot)] = stateCount
    else:
       
        # if state repetition found,
        # assign that previous state number
        stateMap[(state, charNextToDot)] = stateExists
    return
 
 
def generateStates(statesDict):
    prev_len = -1
    called_GOTO_on = []
 
    # run loop till new states are getting added
    while (len(statesDict) != prev_len):
        prev_len = len(statesDict)
        keys = list(statesDict.keys())
 
        # make compute_GOTO function call
        # on all states in dictionary
        for key in keys:
            if key not in called_GOTO_on:
                called_GOTO_on.append(key)
                compute_GOTO(key)
    return
 
# calculation of first
# epsilon is denoted by '#' (semi-colon)
 
# pass rule in first function
def first(rule):
    global rules, nonterm_userdef, \
        term_userdef, diction, firsts
     
    # recursion base condition
    # (for terminal or epsilon)
    if len(rule) != 0 and (rule is not None):
        if rule[0] in term_userdef:
            return rule[0]
        elif rule[0] == '#':
            return '#'
 
    # condition for Non-Terminals
    if len(rule) != 0:
        if rule[0] in list(diction.keys()):
           
            # fres temporary list of result
            fres = []
            rhs_rules = diction[rule[0]]
             
            # call first on each rule of RHS
            # fetched (& take union)
            for itr in rhs_rules:
                indivRes = first(itr)
                if type(indivRes) is list:
                    for i in indivRes:
                        fres.append(i)
                else:
                    fres.append(indivRes)
 
            # if no epsilon in result
            # - received return fres
            if '#' not in fres:
                return fres
            else:
               
                # apply epsilon
                # rule => f(ABC)=f(A)-{e} U f(BC)
                newList = []
                fres.remove('#')
                if len(rule) > 1:
                    ansNew = first(rule[1:])
                    if ansNew != None:
                        if type(ansNew) is list:
                            newList = fres + ansNew
                        else:
                            newList = fres + [ansNew]
                    else:
                        newList = fres
                    return newList
                   
                # if result is not already returned
                # - control reaches here
                # lastly if eplison still persists
                # - keep it in result of first
                fres.append('#')
                return fres
 
 
# calculation of follow
def follow(nt):
    global start_symbol, rules, nonterm_userdef, \
        term_userdef, diction, firsts, follows
     
    # for start symbol return $ (recursion base case)
    solset = set()
    if nt == start_symbol:
        # return '$'
        solset.add('$')
 
    # check all occurrences
    # solset - is result of computed 'follow' so far
 
    # For input, check in all rules
    for curNT in diction:
        rhs = diction[curNT]
         
        # go for all productions of NT
        for subrule in rhs:
            if nt in subrule:
               
                # call for all occurrences on
                # - non-terminal in subrule
                while nt in subrule:
                    index_nt = subrule.index(nt)
                    subrule = subrule[index_nt + 1:]
                     
                    # empty condition - call follow on LHS
                    if len(subrule) != 0:
                       
                        # compute first if symbols on
                        # - RHS of target Non-Terminal exists
                        res = first(subrule)
                         
                        # if epsilon in result apply rule
                        # - (A->aBX)- follow of -
                        # - follow(B)=(first(X)-{ep}) U follow(A)
                        if '#' in res:
                            newList = []
                            res.remove('#')
                            ansNew = follow(curNT)
                            if ansNew != None:
                                if type(ansNew) is list:
                                    newList = res + ansNew
                                else:
                                    newList = res + [ansNew]
                            else:
                                newList = res
                            res = newList
                    else:
                       
                        # when nothing in RHS, go circular
                        # - and take follow of LHS
                        # only if (NT in LHS)!=curNT
                        if nt != curNT:
                            res = follow(curNT)
 
                    # add follow result in set form
                    if res is not None:
                        if type(res) is list:
                            for g in res:
                                solset.add(g)
                        else:
                            solset.add(res)
    return list(solset)
 
 
def createParseTable(statesDict, stateMap, T, NT):
    global separatedRulesList, diction
 
    # create rows and cols
    rows = list(statesDict.keys())
    cols = T+['$']+NT
 
    # create empty table
    Table = []
    tempRow = []
    for y in range(len(cols)):
        tempRow.append('')
    for x in range(len(rows)):
        Table.append(copy.deepcopy(tempRow))
 
    # make shift and GOTO entries in table
    for entry in stateMap:
        state = entry[0]
        symbol = entry[1]
        # get index
        a = rows.index(state)
        b = cols.index(symbol)
        if symbol in NT:
            Table[a][b] = Table[a][b]\
                + f"{stateMap[entry]} "
        elif symbol in T:
            Table[a][b] = Table[a][b]\
                + f"S{stateMap[entry]} "
 
    # start REDUCE procedure
 
    # number the separated rules
    numbered = {}
    key_count = 0
    for rule in separatedRulesList:
        tempRule = copy.deepcopy(rule)
        tempRule[1].remove('.')
        numbered[key_count] = tempRule
        key_count += 1
 
    # start REDUCE procedure
    # format for follow computation
    addedR = f"{seperatedRulesList[0][0]} -> " \
        f"{seperatedRulesList[0][1][1]}"
    rules.insert(0, addedR)
    for rule in rules:
        k = rule.split("->")
         
        # remove un-necessary spaces
        k[0] = k[0].strip()
        k[1] = k[1].strip()
        rhs = k[1]
        multirhs = rhs.split('|')
         
        # remove un-necessary spaces
        for i in range(len(multirhs)):
            multirhs[i] = multirhs[i].strip()
            multirhs[i] = multirhs[i].split()
        diction[k[0]] = multirhs
 
    # find 'handle' items and calculate follow.
    for stateno in statesDict:
        for rule in statesDict[stateno]:
            if rule[1][-1] == '.':
               
                # match the item
                temp2 = copy.deepcopy(rule)
                temp2[1].remove('.')
                for key in numbered:
                    if numbered[key] == temp2:
                       
                        # put Rn in those ACTION symbol columns,
                        # who are in the follow of
                        # LHS of current Item.
                        follow_result = follow(rule[0])
                        for col in follow_result:
                            index = cols.index(col)
                            if key == 0:
                                Table[stateno][index] = "Accept"
                            else:
                                Table[stateno][index] =\
                                    Table[stateno][index]+f"R{key} "
 
    # printing table
    print("\nSLR(1) parsing table:\n")
    frmt = "{:>8}" * len(cols)
    print("  ", frmt.format(*cols), "\n")
    ptr = 0
    j = 0
    for y in Table:
        frmt1 = "{:>8}" * len(y)
        print(f"{{:>3}} {frmt1.format(*y)}"
              .format('I'+str(j)))
        j += 1
 
def printResult(rules):
    for rule in rules:
        print(f"{rule[0]} ->"
              f" {' '.join(rule[1])}")
 
def printAllGOTO(diction):
    for itr in diction:
        print(f"GOTO ( I{itr[0]} ,"
              f" {itr[1]} ) = I{stateMap[itr]}")
 
# *** MAIN *** - Driver Code
 
# uncomment any rules set to test code
# follow given format to add -
# user defined grammar rule set
# rules section - *START*
 
# example sample set 01
rules = ["E -> E + T | T",
         "T -> T * F | F",
         "F -> ( E ) | id"
         ]
nonterm_userdef = ['E', 'T', 'F']
term_userdef = ['id', '+', '*', '(', ')']
start_symbol = nonterm_userdef[0]
 
# example sample set 02
# rules = ["S -> a X d | b Y d | a Y e | b X e",
#          "X -> c",
#          "Y -> c"
#          ]
# nonterm_userdef = ['S','X','Y']
# term_userdef = ['a','b','c','d','e']
# start_symbol = nonterm_userdef[0]
 
# rules section - *END*
print("\nOriginal grammar input:\n")
for y in rules:
    print(y)
 
# print processed rules
print("\nGrammar after Augmentation: \n")
seperatedRulesList = \
    grammarAugmentation(rules,
                        nonterm_userdef,
                        start_symbol)
printResult(seperatedRulesList)
 
# find closure
start_symbol = seperatedRulesList[0][0]
print("\nCalculated closure: I0\n")
I0 = findClosure(0, start_symbol)
printResult(I0)
 
# use statesDict to store the states
# use stateMap to store GOTOs
statesDict = {}
stateMap = {}
 
# add first state to statesDict
# and maintain stateCount
# - for newState generation
statesDict[0] = I0
stateCount = 0
 
# computing states by GOTO
generateStates(statesDict)
 
# print goto states
print("\nStates Generated: \n")
for st in statesDict:
    print(f"State = I{st}")
    printResult(statesDict[st])
    print()
 
print("Result of GOTO computation:\n")
printAllGOTO(stateMap)
 
# "follow computation" for making REDUCE entries
diction = {}
 
# call createParseTable function
createParseTable(statesDict, stateMap,
                 term_userdef,
                 nonterm_userdef)


输出:

Original grammar input:

E -> E + T | T
T -> T * F | F
F -> ( E ) | id

Grammar after Augmentation: 

E' -> . E
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id

Calculated closure: I0

E' -> . E
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id

States Generated: 

State = I0
E' -> . E
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id

State = I1
E' -> E .
E -> E . + T

State = I2
E -> T .
T -> T . * F

State = I3
T -> F .

State = I4
F -> ( . E )
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id

State = I5
F -> id .

State = I6
E -> E + . T
T -> . T * F
T -> . F
F -> . ( E )
F -> . id

State = I7
T -> T * . F
F -> . ( E )
F -> . id

State = I8
F -> ( E . )
E -> E . + T

State = I9
E -> E + T .
T -> T . * F

State = I10
T -> T * F .

State = I11
F -> ( E ) .

Result of GOTO computation:

GOTO ( I0 , E ) = I1
GOTO ( I0 , T ) = I2
GOTO ( I0 , F ) = I3
GOTO ( I0 , ( ) = I4
GOTO ( I0 , id ) = I5
GOTO ( I1 , + ) = I6
GOTO ( I2 , * ) = I7
GOTO ( I4 , E ) = I8
GOTO ( I4 , T ) = I2
GOTO ( I4 , F ) = I3
GOTO ( I4 , ( ) = I4
GOTO ( I4 , id ) = I5
GOTO ( I6 , T ) = I9
GOTO ( I6 , F ) = I3
GOTO ( I6 , ( ) = I4
GOTO ( I6 , id ) = I5
GOTO ( I7 , F ) = I10
GOTO ( I7 , ( ) = I4
GOTO ( I7 , id ) = I5
GOTO ( I8 , ) ) = I11
GOTO ( I8 , + ) = I6
GOTO ( I9 , * ) = I7

SLR(1) parsing table:

         id       +       *       (       )       $       E       T       F 

 I0      S5                      S4                       1       2       3 
 I1              S6                           Accept                        
 I2              R2      S7              R2      R2                         
 I3              R4      R4              R4      R4                         
 I4      S5                      S4                       8       2       3 
 I5              R6      R6              R6      R6                         
 I6      S5                      S4                               9       3 
 I7      S5                      S4                                      10 
 I8              S6                     S11                                 
 I9              R1      S7              R1      R1                         
I10              R3      R3              R3      R3                         
I11              R5      R5              R5      R5