📅  最后修改于: 2023-12-03 14:44:54.713000             🧑  作者: Mango
OpenNLP是指Apache OpenNLP
, 是一个自然语言处理( NLP )工具包,提供了一组用于处理自然语言文本的Java程序库。
使用maven或手动安装两种方式安装。
在 pom.xml
文件中加入以下依赖项:
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
<version>1.9.3</version>
</dependency>
下载 opennlp-tools 并解压。
以下是一些OpenNLP库的使用示例:
句子检测使用 OpenNLP 库的 SentenceDetectorME
类:
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
import java.io.FileInputStream;
import java.io.InputStream;
public class SentenceDetectionExample {
public static void main(String[] args) {
try (InputStream modelIn = new FileInputStream("en-sent.bin")) {
SentenceModel model = new SentenceModel(modelIn);
SentenceDetectorME sentenceDetector = new SentenceDetectorME(model);
String sentence = "This is a sentence. This is another sentence. Now is the time for all good men to come to the aid of their country.";
String[] sentences = sentenceDetector.sentDetect(sentence);
for(String s: sentences) {
System.out.println(s);
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}
输出:
This is a sentence.
This is another sentence.
Now is the time for all good men to come to the aid of their country.
标记化使用 OpenNLP 库的TokenizerME
类:
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import java.io.FileInputStream;
import java.io.InputStream;
public class TokenizerExample {
public static void main(String[] args) {
try (InputStream modelIn = new FileInputStream("en-token.bin")) {
TokenizerModel model = new TokenizerModel(modelIn);
TokenizerME tokenizer = new TokenizerME(model);
String sentence = "This is a sentence.";
String[] tokens = tokenizer.tokenize(sentence);
for (String token : tokens) {
System.out.println(token);
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}
输出:
This
is
a
sentence
.
命名实体识别使用 OpenNLP 库的 NameFinderME
类:
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.Span;
import java.io.FileInputStream;
import java.io.InputStream;
public class NerExample {
public static void main(String[] args) {
try (InputStream modelIn = new FileInputStream("en-ner-person.bin")) {
TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
NameFinderME nameFinder = new NameFinderME(model);
String[] sentence = new String[]{"Pierre", "Vinken", "is", "61", "years", "old"};
Span nameSpans[] = nameFinder.find(sentence);
for(Span s: nameSpans) {
System.out.println(s.toString());
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}
输出:
{0,1,Person}
{1,2,Person}
短语分块使用 OpenNLP 库的 ChunkerME
类:
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.util.Span;
import java.io.FileInputStream;
import java.io.InputStream;
public class ChunkingExample {
public static void main(String[] args) {
try (InputStream modelIn = new FileInputStream("en-chunker.bin")) {
ChunkerModel model = new ChunkerModel(modelIn);
ChunkerME chunker = new ChunkerME(model);
String[] sentence = new String[]{"Pierre", "Vinken", "is", "61", "years", "old"};
String[] tags = new String[]{"NNP", "NNP", "VBZ", "CD", "NNS", "JJ"};
Span[] spans = chunker.chunkAsSpans(sentence, tags);
for(Span s: spans) {
System.out.println(s.toString());
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}
输出:
[0..1] NNP
[1..2] NNP
[2..3] VBZ
[3..6] NP
[4..5] NNS
[5..6] JJ
本教程主要介绍了OpenNLP的一些重要用例,而不是所有可用功能的完整列表。有了这些知识,程序员可以快速的集成自然语言处理功能到他们的应用程序中,从而有效地处理文本数据。