Python – 从文本中提取主题标签
井号标签是前面带有井号 (#) 的关键字或短语,写在帖子或评论中以突出显示它并便于搜索。一些例子是:#like、#gfg、#selfie
我们提供了一个包含主题标签的字符串,我们必须将这些主题标签提取到一个列表中并打印出来。
例子 :
Input : GeeksforGeeks is a wonderful #website for #ComputerScience
Output : website , ComputerScience
Input : This day is beautiful! #instagood #photooftheday #cute
Output : instagood, photooftheday, cute
方法一:
- 使用 split() 方法将文本拆分为单词。
- 对于每个单词,检查第一个字符是否是井号 (#)。
- 如果是,则将该词添加到不带井号的主题标签列表中。
- 打印主题标签列表。
Python3
# function to print all the hashtags in a text
def extract_hashtags(text):
# initializing hashtag_list variable
hashtag_list = []
# splitting the text into words
for word in text.split():
# checking the first character of every word
if word[0] == '#':
# adding the word to the hashtag_list
hashtag_list.append(word[1:])
# printing the hashtag_list
print("The hashtags in \"" + text + "\" are :")
for hashtag in hashtag_list:
print(hashtag)
if __name__=="__main__":
text1 = "GeeksforGeeks is a wonderful # website for # ComputerScience"
text2 = "This day is beautiful ! # instagood # photooftheday # cute"
extract_hashtags(text1)
extract_hashtags(text2)
Python3
# import the regex module
import re
# function to print all the hashtags in a text
def extract_hashtags(text):
# the regular expression
regex = "#(\w+)"
# extracting the hashtags
hashtag_list = re.findall(regex, text)
# printing the hashtag_list
print("The hashtags in \"" + text + "\" are :")
for hashtag in hashtag_list:
print(hashtag)
if __name__=="__main__":
text1 = "GeeksforGeeks is a wonderful # website for # ComputerScience"
text2 = "This day is beautiful ! # instagood # photooftheday # cute"
extract_hashtags(text1)
extract_hashtags(text2)
输出 :
The hashtags in "GeeksforGeeks is a wonderful #website for #ComputerScience" are :
website
ComputerScience
The hashtags in "This day is beautiful! #instagood #photooftheday #cute" are :
instagood
photooftheday
cute
方法2:使用正则表达式。
Python3
# import the regex module
import re
# function to print all the hashtags in a text
def extract_hashtags(text):
# the regular expression
regex = "#(\w+)"
# extracting the hashtags
hashtag_list = re.findall(regex, text)
# printing the hashtag_list
print("The hashtags in \"" + text + "\" are :")
for hashtag in hashtag_list:
print(hashtag)
if __name__=="__main__":
text1 = "GeeksforGeeks is a wonderful # website for # ComputerScience"
text2 = "This day is beautiful ! # instagood # photooftheday # cute"
extract_hashtags(text1)
extract_hashtags(text2)
输出 :
The hashtags in "GeeksforGeeks is a wonderful #website for #ComputerScience" are :
website
ComputerScience
The hashtags in "This day is beautiful! #instagood #photooftheday #cute" are :
instagood
photooftheday
cute