📜  使用 dfflib 模块比较Python的序列

📅  最后修改于: 2022-05-13 01:55:26.032000             🧑  作者: Mango

使用 dfflib 模块比较Python的序列

dfflib Python模块包括评估序列比较的各种功能,可用于比较文件,并且可以创建有关不同格式的文件变体的信息,包括 HTML 和上下文以及统一差异。

包含各种类来执行序列之间的各种比较:

类 SequenceMatcher

它是一个非常灵活的类,用于匹配任何类型的序列对。此类包含以下讨论的各种功能:

  • 此类的ratio()方法返回传递的两个参数之间的相似率。使用以下公式确定相似率。

示例 1:

Python3
# import required module
import difflib
  
# assign parameters
par1 = ['g', 'f', 'g']
par2 = 'gfg'
  
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())


Python3
# import required module
import difflib
  
# assign parameters
par1 = 'Geeks for geeks!'
par2 = 'geeks'
  
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())


Python3
# import required module
import difflib
  
# assign parameters
par1 = 'gfg'
par2 = 'GFG'
  
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())


Python3
# import required module
import difflib
  
# assign parameters
par1 = 'Geeks for geeks!'
par2 = 'geeks'
  
# compare
matches = difflib.SequenceMatcher(
    None, par1, par2).get_matching_blocks()
  
for ele in matches:
    print(par1[ele.a:ele.a + ele.size])


Python3
# import required module
import difflib
  
# assign parameters
par1 = 'GFG'
par2 = 'gfg'
  
# compare
matches = difflib.SequenceMatcher(
    None, par1, par2).get_matching_blocks()
  
for ele in matches:
    print(par1[ele.a:ele.a + ele.size])


Python3
# import required module
import difflib
  
# assign parameters
string = "Geeks4geeks"
listOfStrings = ["for", "Gks", "G4g", "geeks"]
  
# find common strings
print(difflib.get_close_matches(string, listOfStrings))


Python3
# import required module
from difflib import Differ
  
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
  
# compare parameters
for ele in Differ().compare(par1, par2):
    print(ele)


Python3
# import required module
from difflib import Differ
  
# assign parameters
par1 = ['Geeks','for','geeks!']
par2 = 'geeks!'
  
# compare parameters
for ele in Differ().compare(par1, par2):
    print(ele)


Python3
# import required module
import difflib
  
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.ndiff(par1, par2):
    print(ele)


Python3
# import required module
import difflib
  
# assign parameters
par1 = ['Geeks','for','geeks!']
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.ndiff(par1, par2):
    print(ele)


Python3
# import required module
import difflib
  
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.context_diff(par1, par2):
    print(ele)


Python3
# import required module
import difflib
  
# assign parameters
par1 = ['Geeks', 'for', 'geeks!']
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.context_diff(par1, par2):
    print(ele)


输出:

1.0

示例 2:

蟒蛇3

# import required module
import difflib
  
# assign parameters
par1 = 'Geeks for geeks!'
par2 = 'geeks'
  
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())

输出:

0.47619047619047616

示例 3:

蟒蛇3

# import required module
import difflib
  
# assign parameters
par1 = 'gfg'
par2 = 'GFG'
  
# compare
print(difflib.SequenceMatcher(None, par1, par2).ratio())

输出:

0.0
  • 此类的 get_matching_blocks()方法返回描述匹配子序列的三元组列表。每个三元组的形式为 (i, j, n),表示a[i:i+n] == b[j:j+n]

示例 1:



蟒蛇3

# import required module
import difflib
  
# assign parameters
par1 = 'Geeks for geeks!'
par2 = 'geeks'
  
# compare
matches = difflib.SequenceMatcher(
    None, par1, par2).get_matching_blocks()
  
for ele in matches:
    print(par1[ele.a:ele.a + ele.size])

输出:

geeks

示例 2:

蟒蛇3

# import required module
import difflib
  
# assign parameters
par1 = 'GFG'
par2 = 'gfg'
  
# compare
matches = difflib.SequenceMatcher(
    None, par1, par2).get_matching_blocks()
  
for ele in matches:
    print(par1[ele.a:ele.a + ele.size])

输出:

因为GFGgfg之间没有匹配的子序列。所以不显示任何输出。

  • get_close_matches() 方法: 此方法返回字符匹配列的最佳字符或组。术语是一个序列,其中需要密切的相似性(通常是一个字符串),而可能性是一组用于匹配术语的序列(主要是字符串列表)。

例子 :

蟒蛇3

# import required module
import difflib
  
# assign parameters
string = "Geeks4geeks"
listOfStrings = ["for", "Gks", "G4g", "geeks"]
  
# find common strings
print(difflib.get_close_matches(string, listOfStrings))

输出:

['geeks']

等级差异

此类用于匹配文本行形式的序列并创建人类可读的变体或增量。 Differ delta 的每一行都以两个字母的代码开头:

CodeMeaning
‘- ‘line unique to sequence 1
‘+ ‘line unique to sequence 2
‘  ‘line common to both sequences
‘? ‘line not present in either input sequence

以下是该类中包含的函数:

  • 比较() 此类中的方法,比较两个行序列,并生成增量(行序列)。

示例 1:

蟒蛇3

# import required module
from difflib import Differ
  
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
  
# compare parameters
for ele in Differ().compare(par1, par2):
    print(ele)

输出:



- G
+ g
  e
  e
  k
  s
+ !

示例 2:

蟒蛇3

# import required module
from difflib import Differ
  
# assign parameters
par1 = ['Geeks','for','geeks!']
par2 = 'geeks!'
  
# compare parameters
for ele in Differ().compare(par1, par2):
    print(ele)

输出:

- G
+ g
  e
  e
  k
  s
+ !
  • ndiff() 方法:也可以使用此方法执行上述类型的比较。但是如果传递列表,则首先比较列表的元素

示例 1:

蟒蛇3

# import required module
import difflib
  
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.ndiff(par1, par2):
    print(ele)

输出:

- G
+ g
  e
  e
  k
  s
+ !

示例 2:

蟒蛇3

# import required module
import difflib
  
# assign parameters
par1 = ['Geeks','for','geeks!']
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.ndiff(par1, par2):
    print(ele)

输出:

- Geeks
- for
- geeks!
+ g
+ e
+ e
+ k
+ s
+ !
  • context_diff() 方法: 上下文差异是一种方便的方式,可以仅显示已移动的行以及几行上下文。在之前/之后的样式中可以看到改进。背景线数设置为n,默认设置为3。

示例 1:

蟒蛇3

# import required module
import difflib
  
# assign parameters
par1 = 'Geeks'
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.context_diff(par1, par2):
    print(ele)

输出:

示例 2:

蟒蛇3

# import required module
import difflib
  
# assign parameters
par1 = ['Geeks', 'for', 'geeks!']
par2 = 'geeks!'
  
# compare parameters
for ele in difflib.context_diff(par1, par2):
    print(ele)

输出: