使用 dfflib 模块比较Python的序列
dfflib Python模块包括评估序列比较的各种功能,可用于比较文件,并且可以创建有关不同格式的文件变体的信息,包括 HTML 和上下文以及统一差异。
它包含各种类来执行序列之间的各种比较:
类 SequenceMatcher
它是一个非常灵活的类,用于匹配任何类型的序列对。此类包含以下讨论的各种功能:
- 此类的ratio()方法返回传递的两个参数之间的相似率。使用以下公式确定相似率。
2*X/Y
Where X is the number of similar matches and
Y is the total elements present in both the sequences.
示例 1:
Python3
# import required module
import difflib
# assign parameters
par1 = ['g', 'f', 'g']
par2 = 'gfg'
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())
Python3
# import required module
import difflib
# assign parameters
par1 = 'Geeks for geeks!'
par2 = 'geeks'
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())
Python3
# import required module
import difflib
# assign parameters
par1 = 'gfg'
par2 = 'GFG'
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())
Python3
# import required module
import difflib
# assign parameters
par1 = 'Geeks for geeks!'
par2 = 'geeks'
# compare
matches = difflib.SequenceMatcher(
None, par1, par2).get_matching_blocks()
for ele in matches:
print(par1[ele.a:ele.a + ele.size])
Python3
# import required module
import difflib
# assign parameters
par1 = 'GFG'
par2 = 'gfg'
# compare
matches = difflib.SequenceMatcher(
None, par1, par2).get_matching_blocks()
for ele in matches:
print(par1[ele.a:ele.a + ele.size])
Python3
# import required module
import difflib
# assign parameters
string = "Geeks4geeks"
listOfStrings = ["for", "Gks", "G4g", "geeks"]
# find common strings
print(difflib.get_close_matches(string, listOfStrings))
Python3
# import required module
from difflib import Differ
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
# compare parameters
for ele in Differ().compare(par1, par2):
print(ele)
Python3
# import required module
from difflib import Differ
# assign parameters
par1 = ['Geeks','for','geeks!']
par2 = 'geeks!'
# compare parameters
for ele in Differ().compare(par1, par2):
print(ele)
Python3
# import required module
import difflib
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
# compare parameters
for ele in difflib.ndiff(par1, par2):
print(ele)
Python3
# import required module
import difflib
# assign parameters
par1 = ['Geeks','for','geeks!']
par2 = 'geeks!'
# compare parameters
for ele in difflib.ndiff(par1, par2):
print(ele)
Python3
# import required module
import difflib
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
# compare parameters
for ele in difflib.context_diff(par1, par2):
print(ele)
Python3
# import required module
import difflib
# assign parameters
par1 = ['Geeks', 'for', 'geeks!']
par2 = 'geeks!'
# compare parameters
for ele in difflib.context_diff(par1, par2):
print(ele)
输出:
1.0
示例 2:
蟒蛇3
# import required module
import difflib
# assign parameters
par1 = 'Geeks for geeks!'
par2 = 'geeks'
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())
输出:
0.47619047619047616
示例 3:
蟒蛇3
# import required module
import difflib
# assign parameters
par1 = 'gfg'
par2 = 'GFG'
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())
输出:
0.0
- 此类的 get_matching_blocks()方法返回描述匹配子序列的三元组列表。每个三元组的形式为 (i, j, n),表示a[i:i+n] == b[j:j+n] 。
示例 1:
蟒蛇3
# import required module
import difflib
# assign parameters
par1 = 'Geeks for geeks!'
par2 = 'geeks'
# compare
matches = difflib.SequenceMatcher(
None, par1, par2).get_matching_blocks()
for ele in matches:
print(par1[ele.a:ele.a + ele.size])
输出:
geeks
示例 2:
蟒蛇3
# import required module
import difflib
# assign parameters
par1 = 'GFG'
par2 = 'gfg'
# compare
matches = difflib.SequenceMatcher(
None, par1, par2).get_matching_blocks()
for ele in matches:
print(par1[ele.a:ele.a + ele.size])
输出:
因为GFG和gfg之间没有匹配的子序列。所以不显示任何输出。
- get_close_matches() 方法: 此方法返回字符匹配列的最佳字符或组。术语是一个序列,其中需要密切的相似性(通常是一个字符串),而可能性是一组用于匹配术语的序列(主要是字符串列表)。
例子 :
蟒蛇3
# import required module
import difflib
# assign parameters
string = "Geeks4geeks"
listOfStrings = ["for", "Gks", "G4g", "geeks"]
# find common strings
print(difflib.get_close_matches(string, listOfStrings))
输出:
['geeks']
等级差异
此类用于匹配文本行形式的序列并创建人类可读的变体或增量。 Differ delta 的每一行都以两个字母的代码开头:Code Meaning ‘- ‘ line unique to sequence 1 ‘+ ‘ line unique to sequence 2 ‘ ‘ line common to both sequences ‘? ‘ line not present in either input sequence
以下是该类中包含的函数:
- 比较() 此类中的方法,比较两个行序列,并生成增量(行序列)。
示例 1:
蟒蛇3
# import required module
from difflib import Differ
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
# compare parameters
for ele in Differ().compare(par1, par2):
print(ele)
输出:
- G
+ g
e
e
k
s
+ !
示例 2:
蟒蛇3
# import required module
from difflib import Differ
# assign parameters
par1 = ['Geeks','for','geeks!']
par2 = 'geeks!'
# compare parameters
for ele in Differ().compare(par1, par2):
print(ele)
输出:
- G
+ g
e
e
k
s
+ !
- ndiff() 方法:也可以使用此方法执行上述类型的比较。但是,如果传递列表,则首先比较列表的元素。
示例 1:
蟒蛇3
# import required module
import difflib
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
# compare parameters
for ele in difflib.ndiff(par1, par2):
print(ele)
输出:
- G
+ g
e
e
k
s
+ !
示例 2:
蟒蛇3
# import required module
import difflib
# assign parameters
par1 = ['Geeks','for','geeks!']
par2 = 'geeks!'
# compare parameters
for ele in difflib.ndiff(par1, par2):
print(ele)
输出:
- Geeks
- for
- geeks!
+ g
+ e
+ e
+ k
+ s
+ !
- context_diff() 方法: 上下文差异是一种方便的方式,可以仅显示已移动的行以及几行上下文。在之前/之后的样式中可以看到改进。背景线数设置为n,默认设置为3。
示例 1:
蟒蛇3
# import required module
import difflib
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
# compare parameters
for ele in difflib.context_diff(par1, par2):
print(ele)
输出:
***
—
***************
*** 1,5 ****
! G
e
e
k
s
— 1,6 —-
! g
e
e
k
s
+ !
示例 2:
蟒蛇3
# import required module
import difflib
# assign parameters
par1 = ['Geeks', 'for', 'geeks!']
par2 = 'geeks!'
# compare parameters
for ele in difflib.context_diff(par1, par2):
print(ele)
输出:
***
—
***************
*** 1,3 ****
! Geeks
! for
! geeks!
— 1,6 —-
! g
! e
! e
! k
! s
! !