Python NLTK | nltk.tokenize.LineTokenizer
借助nltk.tokenize.LineTokenizer()
方法,我们可以使用tokenize.LineTokenizer()
方法从单行形式的句子字符串中提取标记。
Syntax : tokenize.LineTokenizer()
Return : Return the tokens of line from stream of sentences.
示例 #1:
在这个例子中我们可以看到,通过使用tokenize.LineTokenizer()
方法,我们能够从句子流中提取标记成小行。
# import LineTokenizer() method from nltk
from nltk.tokenize import LineTokenizer
# Create a reference variable for Class LineTokenizer
tk = LineTokenizer()
# Create a string input
gfg = "GeeksforGeeks...$$&* \nis\n for geeks"
# Use tokenize method
geek = tk.tokenize(gfg)
print(geek)
输出 :
[‘GeeksforGeeks…$$&* ‘, ‘is’, ‘ for geeks’]
示例 #2:
# import LineTokenizer() method from nltk
from nltk.tokenize import LineTokenizer
# Create a reference variable for Class LineTokenizer
tk = LineTokenizer(blanklines ='keep')
# Create a string input
gfg = "The price\n\n of burger \nin BurgerKing is Rs.36.\n"
# Use tokenize method
geek = tk.tokenize(gfg)
print(geek)
输出 :
[‘The price’, ”, ‘ of burger ‘, ‘in BurgerKing is Rs.36.’]