📅  最后修改于: 2023-12-03 14:44:36.822000             🧑  作者: Mango
NLTK (Natural Language Toolkit) is a Python library that provides a wide range of tools and resources for working with human language data such as text. It provides modules for tokenizing, stemming, tagging, parsing, semantic reasoning, and more. NLTK is widely used in research and industry for processing natural language data.
You can install NLTK using pip:
pip install nltk
After installation, you will also need to download the NLTK data using:
import nltk
nltk.download()
Here is an example of tokenizing a sentence using NLTK:
import nltk
sentence = "This is an example sentence."
tokens = nltk.word_tokenize(sentence)
print(tokens)
This will output:
['This', 'is', 'an', 'example', 'sentence', '.']
NLTK provides access to many resources such as corpora, lexicons, and trained models. Here is an example of using the Brown corpus:
import nltk
nltk.download('brown')
from nltk.corpus import brown
# Print the categories in the corpus
print(brown.categories())
# Print the words in the news category
print(brown.words(categories='news'))
NLTK is a powerful Python library for working with natural language data. It provides many useful tools and resources that can be used for various natural language processing tasks. With NLTK, you can easily tokenize, stem, tag, parse, and reason about human language data.