Python中的拼写检查器
对于任何类型的文本处理或分析,检查单词的拼写是基本要求之一。本文讨论了检查单词拼写以及更正各个单词拼写的各种方法。
使用 textblob 库
首先,您需要在命令提示符下使用 pip 安装库textblob 。
pip install textblob
你也可以在 Jupyter Notebook 中安装这个库:
Python3
import sys
!{sys.executable} - m pip install textblob
Python3
from textblob import TextBlob
a = "cmputr" # incorrect spelling
print("original text: "+str(a))
b = TextBlob(a)
# prints the corrected spelling
print("corrected text: "+str(b.correct()))
Python3
import sys
!{sys.executable} - m pip install pyspellchecker
Python3
from spellchecker import SpellChecker
spell = SpellChecker()
# find those words that may be misspelled
misspelled = spell.unknown(["cmputr", "watr", "study", "wrte"])
for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))
# Get a list of `likely` options
print(spell.candidates(word))
Python3
# Create a corrector
corrector = jamspell.TSpellCorrector()
# Load Language model -
# argument is a downloaded model file path
corrector.LoadLangModel('Downloads/en_model.bin')
# To fix text automatically run FixFragment:
print(corrector.FixFragment('I am the begt spell cherken!'))
# To get a list of possible candidates
# pass a splitted sentence, and a word position
print(corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 3))
print(corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 5))
拼写检查程序 -
Python3
from textblob import TextBlob
a = "cmputr" # incorrect spelling
print("original text: "+str(a))
b = TextBlob(a)
# prints the corrected spelling
print("corrected text: "+str(b.correct()))
输出:
original text: cmputr
corrected text: computer
使用 pyspellchecker 库
您可以按如下方式安装此库:
使用点子:
pip install pyspellchecker
在 Jupyter 笔记本中:
Python3
import sys
!{sys.executable} - m pip install pyspellchecker
使用 pyspellchecker 的拼写检查程序 –
Python3
from spellchecker import SpellChecker
spell = SpellChecker()
# find those words that may be misspelled
misspelled = spell.unknown(["cmputr", "watr", "study", "wrte"])
for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))
# Get a list of `likely` options
print(spell.candidates(word))
输出:
computer
{'caput', 'caputs', 'compute', 'computor', 'impute', 'computer'}
water
{'water', 'watt', 'warr', 'wart', 'war', 'wath', 'wat'}
write
{'wroe', 'arte', 'wre', 'rte', 'wrote', 'write'}
使用 JamSpell
要在进行拼写更正的同时达到最佳质量,基于字典的方法是不够的。你需要考虑环境这个词。 JamSpell 是一个基于语言模型的Python拼写检查库。它针对不同的上下文进行不同的更正。
1) 安装 swig3
apt-get install swig3.0 # for linux
brew install swig@3 # for mac
2)安装果酱
pip install jamspell
3) 为您的语言下载语言模型
Python3
# Create a corrector
corrector = jamspell.TSpellCorrector()
# Load Language model -
# argument is a downloaded model file path
corrector.LoadLangModel('Downloads/en_model.bin')
# To fix text automatically run FixFragment:
print(corrector.FixFragment('I am the begt spell cherken!'))
# To get a list of possible candidates
# pass a splitted sentence, and a word position
print(corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 3))
print(corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 5))
输出:
u'I am the best spell checker!'
(u'best', u'beat', u'belt', u'bet', u'bent')
(u'checker', u'chicken', u'checked', u'wherein', u'coherent', ...)