查找重叠子串索引的Python程序
要计算Python中重叠子字符串的数量,我们可以使用 Re 模块。要获取索引,我们将使用 re.finditer() 方法。但它仅返回非重叠索引的计数。
例子:
Input: String: “geeksforgeeksforgeeks” ; Pattern: “geeksforgeeks”
Output: [0, 8]
Explanation: The pattern is overlapping the string from 0th index to 12th index and again overlapping it from 8th index to 20th index. Hence, the output is the starting positions of overlapping i.e index 0 and index 8.
Input: String: “barfoobarfoobarfoobarfoobarfoo” ; Pattern: “foobarfoo”
Output: [3, 9,15, 21]
Explanation: The pattern is overlapping the string from index 3, 9 , 15 and 21.
此方法仅从具有多次出现重叠模式的字符串返回非重叠索引的计数。下面是一个描述finditer()方法使用的程序。
Python3
# Import required module
import re
# Function to depict use of finditer() method
def CntSubstr(pattern, string):
# Array storing the indices
a = [m.start() for m in re.finditer(pattern, string)]
return a
# Driver Code
string = 'geeksforgeeksforgeeks'
pattern = 'geeksforgeeks'
# Printing index values of non-overlapping pattern
print(CntSubstr(pattern, string))
Python3
# Import required module
import re
# Explicit function to Count
# Indices of Overlapping Substrings
def CntSubstr(pattern, string):
a = [m.start() for m in re.finditer(
'(?={0})'.format(re.escape(pattern)), string)]
return a
# Driver Code
string1 = 'geeksforgeeksforgeeks'
pattern1 = 'geeksforgeeks'
string2 = 'barfoobarfoobarfoobarfoobarfoo'
pattern2 = 'foobarfoo'
# Calling the function
print(CntSubstr(pattern1, string1))
print(CntSubstr(pattern2, string2))
输出:
[0]
因此,为了获得重叠的索引,我们需要做的是摆脱模式中的正则表达式。显式函数中的定义有助于以部分方式选择字符。
方法:
- re.finditer()有助于找到匹配对象出现的索引。当它返回一个可迭代对象时, start()方法有助于返回索引,否则它会显示在某个位置找到了匹配对象。
- 使用 re 模块进行匹配的标准方法是贪婪的,这意味着匹配最大数量的字符。因此, ?={0}有助于最小数量的匹配。
- 为了匹配它以便匹配部分字符, re.escape() 有助于转义之前添加的特殊字符,例如?={0} 。
- 结果是通过添加一些修改, finditer()方法返回一个重叠索引列表。
下面是上述方法的实现:
蟒蛇3
# Import required module
import re
# Explicit function to Count
# Indices of Overlapping Substrings
def CntSubstr(pattern, string):
a = [m.start() for m in re.finditer(
'(?={0})'.format(re.escape(pattern)), string)]
return a
# Driver Code
string1 = 'geeksforgeeksforgeeks'
pattern1 = 'geeksforgeeks'
string2 = 'barfoobarfoobarfoobarfoobarfoo'
pattern2 = 'foobarfoo'
# Calling the function
print(CntSubstr(pattern1, string1))
print(CntSubstr(pattern2, string2))
输出:
[0, 8]
[3, 9, 15, 21]