使用 Polyglot 的自然语言处理 - 简介
本文介绍了一个名为Polyglot的Python NLP 包,它支持各种多语言应用程序并提供广泛的分析和广泛的语言覆盖。它由Rami Al-Rfou 开发。它包含许多功能,例如
- 语言检测(196 种语言)
- 标记化(165 种语言)
- 命名实体识别(40 种语言)
- 部分语音标记(16 种语言)
- 情绪分析(136 种语言)等等
首先,让我们安装一些必需的包:
使用 Google Colab 轻松顺利地安装。
pip install polyglot
# installing dependency packages
pip install pyicu
# installing dependency packages
pip install Morfessor
# installing dependency packages
pip install pycld2
下载一些必要的模型
使用 Google colab 轻松安装模型
%%bash
polyglot download ner2.en # downloading model ner
%%bash
polyglot download pos2.en # downloading model pos
%%bash
polyglot download sentiment2.en # downloading model sentiment
代码:语言检测
python3
from polyglot.detect import Detector
spanish_text = u"""¡Hola ! Mi nombre es Ana. Tengo veinticinco años. Vivo en Miami, Florida"""
detector = Detector(spanish_text)
print(detector.language)
python3
# importing Text from polyglot library
from polyglot.text import Text
sentences = u"""Suggest a platform for placement preparation?. GFG is a very good platform for placement
preparation."""
# passing sentences through imported Text
text = Text(sentences)
# dividing sentences into words
print(text.words)
print('\n')
# separating sentences
print(text.sentences)
python3
from polyglot.text import Text
sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""
text = Text(sentence, hint_language_code ='en')
print(text.entities)
python3
from polyglot.text import Text
sentence = """GeeksforGeeks is the best place for learning things in simple manner."""
text = Text(sentence)
print(text.pos_tags)
python3
from polyglot.text import Text
sentence1 = """ABC is one of the best university in the world."""
sentence2 = """ABC is one of the worst university in the world."""
text1 = Text(sentence1)
text2 = Text(sentence2)
print(text1.polarity)
print(text2.polarity)
输出::
它检测到的文本是西班牙文,置信度为 98
代码:标记化
标记化是将句子拆分为单词,甚至将段落拆分为句子的过程。
蟒蛇3
# importing Text from polyglot library
from polyglot.text import Text
sentences = u"""Suggest a platform for placement preparation?. GFG is a very good platform for placement
preparation."""
# passing sentences through imported Text
text = Text(sentences)
# dividing sentences into words
print(text.words)
print('\n')
# separating sentences
print(text.sentences)
输出:
它将句子分成单词,甚至将两个不同的句子分开。
代码:命名实体识别:
Polyglot 识别三类实体:
- 地点
- 组织
- 人
蟒蛇3
from polyglot.text import Text
sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""
text = Text(sentence, hint_language_code ='en')
print(text.entities)
输出:
I-ORG 指组织
I-LOC 是指位置
I-PER 指人
代码:语音标记的一部分
蟒蛇3
from polyglot.text import Text
sentence = """GeeksforGeeks is the best place for learning things in simple manner."""
text = Text(sentence)
print(text.pos_tags)
输出:
这里 ADP 指代词,ADJ 指形容词,DET 指限定词
代码——情绪分析
蟒蛇3
from polyglot.text import Text
sentence1 = """ABC is one of the best university in the world."""
sentence2 = """ABC is one of the worst university in the world."""
text1 = Text(sentence1)
text2 = Text(sentence2)
print(text1.polarity)
print(text2.polarity)
输出:
1 表示句子处于肯定语境中
-1 表示句子处于否定语境中