用于模式搜索的 Rabin-Karp 算法的Python程序
给定一个文本txt[0..n-1]和一个模式pat[0..m-1] ,编写一个函数search(char pat[], char txt[])打印txt中所有出现的pat[] [] 。你可以假设 n > m。
例子:
输入:txt[] = "THIS IS A TEST TEXT" pat[] = "TEST" 输出:在索引 10 找到的模式 输入:txt[] = "AABAACAADAABAABA" pat[] = "AABA" 输出:在索引 0 找到的模式在索引 9 找到的模式 在索引 12 找到的模式
朴素字符串匹配算法会一张一张地滑动模式。每张幻灯片后,它会逐一检查当前班次的字符,如果所有字符都匹配,则打印匹配项。
与朴素算法一样,Rabin-Karp 算法也将图案一一滑动。但与 Naive 算法不同的是,Rabin Karp 算法将模式的哈希值与当前文本子串的哈希值匹配,如果哈希值匹配,则只有它开始匹配单个字符。所以 Rabin Karp 算法需要为后面的字符串计算哈希值。
1) 图案本身。
2) 长度为 m 的文本的所有子串。
Python
# Following program is the python implementation of
# Rabin Karp Algorithm given in CLRS book
# d is the number of characters in the input alphabet
d = 256
# pat -> pattern
# txt -> text
# q -> A prime number
def search(pat, txt, q):
M = len(pat)
N = len(txt)
i = 0
j = 0
p = 0 # hash value for pattern
t = 0 # hash value for txt
h = 1
# The value of h would be "pow(d, M-1)% q"
for i in xrange(M-1):
h = (h * d)% q
# Calculate the hash value of pattern and first window
# of text
for i in xrange(M):
p = (d * p + ord(pat[i]))% q
t = (d * t + ord(txt[i]))% q
# Slide the pattern over text one by one
for i in xrange(N-M + 1):
# Check the hash values of current window of text and
# pattern if the hash values match then only check
# for characters on by one
if p == t:
# Check for characters one by one
for j in xrange(M):
if txt[i + j] != pat[j]:
break
j+= 1
# if p == t and pat[0...M-1] = txt[i, i + 1, ...i + M-1]
if j == M:
print "Pattern found at index " + str(i)
# Calculate hash value for next window of text: Remove
# leading digit, add trailing digit
if i < N-M:
t = (d*(t-ord(txt[i])*h) + ord(txt[i + M]))% q
# We might get negative values of t, converting it to
# positive
if t < 0:
t = t + q
# Driver program to test the above function
txt = "GEEKS FOR GEEKS"
pat = "GEEK"
q = 101 # A prime number
search(pat, txt, q)
# This code is contributed by Bhavya Jain
输出:
Pattern found at index 0
Pattern found at index 10
有关更多详细信息,请参阅有关模式搜索的 Rabin-Karp 算法的完整文章!
在评论中写代码?请使用 ide.geeksforgeeks.org,生成链接并在此处分享链接。