自然语言处理 |布里尔标记器
- BrillTagger 类是一个基于转换的标注器。它不是 SequentialBackoffTagger 的子类。
- 此外,它使用一系列规则来纠正初始标注器的结果。
- 它遵循的这些规则是基于评分的。这个分数等于没有。他们纠正的错误减去没有。他们产生的新错误。
代码 #1:训练 BrillTagger 类
# Loading Libraries
from nltk.tag import brill, brill_trainer
def train_brill_tagger(initial_tagger, train_sents, **kwargs):
templates = [
brill.Template(brill.Pos([-1])),
brill.Template(brill.Pos([1])),
brill.Template(brill.Pos([-2])),
brill.Template(brill.Pos([2])),
brill.Template(brill.Pos([-2, -1])),
brill.Template(brill.Pos([1, 2])),
brill.Template(brill.Pos([-3, -2, -1])),
brill.Template(brill.Pos([1, 2, 3])),
brill.Template(brill.Pos([-1]), brill.Pos([1])),
brill.Template(brill.Word([-1])),
brill.Template(brill.Word([1])),
brill.Template(brill.Word([-2])),
brill.Template(brill.Word([2])),
brill.Template(brill.Word([-2, -1])),
brill.Template(brill.Word([1, 2])),
brill.Template(brill.Word([-3, -2, -1])),
brill.Template(brill.Word([1, 2, 3])),
brill.Template(brill.Word([-1]), brill.Word([1])),
]
# Using BrillTaggerTrainer to train
trainer = brill_trainer.BrillTaggerTrainer(
initial_tagger, templates, deterministic = True)
return trainer.train(train_sents, **kwargs)
代码 #2:让我们使用经过训练的 BrillTagger
from nltk.tag import brill, brill_trainer
from nltk.tag import DefaultTagger
from nltk.corpus import treebank
from tag_util import train_brill_tagger
# Initializing
default_tag = DefaultTagger('NN')
# initializing training and testing set
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]
initial_tag = backoff_tagger(
train_data, [UnigramTagger, BigramTagger,
TrigramTagger], backoff = default_tagger)
a = initial_tag.evaluate(test_data)
print ("Accuracy of Initial Tag : ", a)
输出 :
Accuracy of Initial Tag : 0.8806820634578028
代码#3:
brill_tag = train_brill_tagger(initial_tag, train_data)
b = brill_tag.evaluate(test_data)
print ("Accuracy of brill_tag : ", b)
输出 :
Accuracy of brill_tag : 0.8827541549751781