Python|使用 num2words 数字到单词
Python中的 num2words 模块,它将数字(如 34)转换为单词(如三十四)。此外,该库还支持多种语言。在本文中,我们将看到如何使用num2words
模块将数字转换为单词。
安装
使用 pip 可以轻松安装num2words
。
pip install num2words
考虑以下两个来自 20 Newsgroups(一个流行的 NLP 数据库)的不同文件的摘录。有效地预处理 20 个新闻组仍然是一个有趣的问题。
In article, Martin Preston writes: Why not use the PD C library for reading/writing TIFF files? It took me a good 20 minutes to start using them in your own app.
ISCIS VIII is the eighth of a series of meetings which have brought together computer scientists and engineers from about twenty countries. This year’s conference will be held in the beautiful Mediterranean resort city of Antalya, in a region rich in natural as well as historical sites.
在上面的两个摘录中,可以观察到数字“20”以数字和字母两种形式出现。简单地遵循涉及标记化、词形还原等的预处理步骤将无法将“20”和“20”映射到同一个词干,这在上下文中很重要。幸运的是,我们有内置库num2words
可以在一行中解决这个问题。
以下是该工具的示例用法。
from num2words import num2words
# Most common usage.
print(num2words(36))
# Other variants, according to the type of article.
print(num2words(36, to = 'ordinal'))
print(num2words(36, to = 'ordinal_num'))
print(num2words(36, to = 'year'))
print(num2words(36, to = 'currency'))
# Language Support.
print(num2words(36, lang ='es'))
输出:
thirty-six
thirty-sixth
36th
zero euro, thirty-six cents
treinta y seis
因此,在预处理步骤中,可以将所有数值转换为单词,以便在进一步的阶段获得更好的准确性。
参考资料: https://pypi.org/project/num2words/