自然语言处理 | WuPalmer – WordNet 相似度

Wu 和 Palmer 相似性如何工作？
它通过考虑 WordNet 分类法中两个同义词集的深度以及 LCS（Least Common Subsumer）的深度来计算相关性。

分数可以是 0 < 分数 <= 1。分数永远不能为零，因为 LCS 的深度永远不会为零（分类根的深度为 1）。
它根据词义的相似程度以及同义词在上位词树中相对于彼此出现的位置来计算相似度。
代码 #1：引入 Synsets

Python3

from nltk.corpus import wordnet
 
syn1 = wordnet.synsets('hello')[0]
syn2 = wordnet.synsets('selling')[0]
 
print ("hello name :  ", syn1.name())
print ("selling name :  ", syn2.name())

Python3

syn1.wup_similarity(syn2)

Python3

sorted(syn1.common_hypernyms(syn2))

Python3

ref = syn1.hypernyms()[0]
print ("Self comparison : ",
       syn1.shortest_path_distance(ref))
 
print ("Distance of hello from greeting : ",
       syn1.shortest_path_distance(syn2))
 
print ("Distance of greeting from hello : ",
       syn2.shortest_path_distance(syn1))

输出：

hello name :   hello.n.01
selling name :   selling.n.01

代码#2：吴相似度

Python3

syn1.wup_similarity(syn2)

输出：

0.26666666666666666

你好和销售显然有 27% 相似！这是因为它们在两者之上共享共同的上位词。代码#3：让我们检查一下它们之间的上位词。

Python3

sorted(syn1.common_hypernyms(syn2))

输出：

[Synset('abstraction.n.06'), Synset('entity.n.01')]

用于计算相似度的核心指标之一是两个 Synset 与其共同上位词之间的距离的最短路径。代码 #4：让我们了解 hypernerm 的使用。

Python3

ref = syn1.hypernyms()[0]
print ("Self comparison : ",
       syn1.shortest_path_distance(ref))
 
print ("Distance of hello from greeting : ",
       syn1.shortest_path_distance(syn2))
 
print ("Distance of greeting from hello : ",
       syn2.shortest_path_distance(syn1))

输出：

Self comparison :  1
Distance of hello from greeting :  11
Distance of greeting from hello :  11

注意：相似度得分非常高，即它们彼此相距很多步，因为它们不太相似。这里提到的代码使用“名词”，但可以使用任何词性 (POS)。