Python NLTK | nltk.tokenize.TabTokenizer()
在nltk.tokenize.TabTokenizer()
方法的帮助下,我们可以使用tokenize.TabTokenizer()
方法根据它们之间的制表符从字符串中提取标记。
Syntax : tokenize.TabTokenizer()
Return : Return the tokens of words.
示例 #1:
在这个例子中,我们可以看到通过使用tokenize.TabTokenizer()
方法,我们能够将标记从流中提取到它们之间有制表符的单词。
# import TabTokenizer() method from nltk
from nltk.tokenize import TabTokenizer
# Create a reference variable for Class TabTokenizer
tk = TabTokenizer()
# Create a string input
gfg = "Geeksfor\tGeeks..\t.$$&* \nis\t for geeks"
# Use tokenize method
geek = tk.tokenize(gfg)
print(geek)
输出 :
[‘Geeksfor’, ‘Geeks..’, ‘.$$&* \nis’, ‘ for geeks’]
示例 #2:
# import TabTokenizer() method from nltk
from nltk.tokenize import TabTokenizer
# Create a reference variable for Class TabTokenizer
tk = TabTokenizer()
# Create a string input
gfg = "The price\t of burger \tin BurgerKing is Rs.36.\n"
# Use tokenize method
geek = tk.tokenize(gfg)
print(geek)
输出 :
[‘The price’, ‘ of burger ‘, ‘in BurgerKing is Rs.36.\n’]