📜  Python NLTK |标记化.regexp()

📅  最后修改于: 2022-05-13 01:54:51.309000             🧑  作者: Mango

Python NLTK |标记化.regexp()

NLTK tokenize.regexp()模块的帮助下,我们可以通过RegexpTokenizer()方法使用正则表达式从字符串中提取标记。

示例 #1:
在此示例中,我们使用RegexpTokenizer()方法在正则表达式的帮助下提取标记流。

# import RegexpTokenizer() method from nltk
from nltk.tokenize import RegexpTokenizer
    
# Create a reference variable for Class RegexpTokenizer
tk = RegexpTokenizer('\s+', gaps = True)
    
# Create a string input
gfg = "I love Python"
    
# Use tokenize method
geek = tk.tokenize(gfg)
    
print(geek)

输出 :

示例 #2:

# import RegexpTokenizer() method from nltk
from nltk.tokenize import RegexpTokenizer
    
# Create a reference variable for Class RegexpTokenizer
tk = RegexpTokenizer('\s+', gaps = True)
    
# Create a string input
gfg = "Geeks for Geeks"
    
# Use tokenize method
geek = tk.tokenize(gfg)
    
print(geek)

输出 :