📜  Python – 从文本中提取主题标签

📅  最后修改于: 2022-05-13 01:55:32.393000             🧑  作者: Mango

Python – 从文本中提取主题标签

井号标签是前面带有井号 (#) 的关键字或短语,写在帖子或评论中以突出显示它并便于搜索。一些例子是:#like、#gfg、#selfie
我们提供了一个包含主题标签的字符串,我们必须将这些主题标签提取到一个列表中并打印出来。

例子 :

Input : GeeksforGeeks is a wonderful #website for #ComputerScience
Output :  website , ComputerScience

Input : This day is beautiful! #instagood #photooftheday #cute
Output :  instagood, photooftheday, cute

方法一:

  • 使用 split() 方法将文本拆分为单词。
  • 对于每个单词,检查第一个字符是否是井号 (#)。
  • 如果是,则将该词添加到不带井号的主题标签列表中。
  • 打印主题标签列表。
Python3
# function to print all the hashtags in a text
def extract_hashtags(text):
     
    # initializing hashtag_list variable
    hashtag_list = []
     
    # splitting the text into words
    for word in text.split():
         
        # checking the first character of every word
        if word[0] == '#':
             
            # adding the word to the hashtag_list
            hashtag_list.append(word[1:])
     
    # printing the hashtag_list
    print("The hashtags in \"" + text + "\" are :")
    for hashtag in hashtag_list:
        print(hashtag)
 
if __name__=="__main__":
    text1 = "GeeksforGeeks is a wonderful # website for # ComputerScience"
    text2 = "This day is beautiful ! # instagood # photooftheday # cute"
    extract_hashtags(text1)
    extract_hashtags(text2)


Python3
# import the regex module
import re
 
# function to print all the hashtags in a text
def extract_hashtags(text):
     
    # the regular expression
    regex = "#(\w+)"
     
    # extracting the hashtags
    hashtag_list = re.findall(regex, text)
     
    # printing the hashtag_list
    print("The hashtags in \"" + text + "\" are :")
    for hashtag in hashtag_list:
        print(hashtag)
 
if __name__=="__main__":
    text1 = "GeeksforGeeks is a wonderful # website for # ComputerScience"
    text2 = "This day is beautiful ! # instagood # photooftheday # cute"
    extract_hashtags(text1)
    extract_hashtags(text2)


输出 :

The hashtags in "GeeksforGeeks is a wonderful #website for #ComputerScience" are :
website
ComputerScience
The hashtags in "This day is beautiful! #instagood #photooftheday #cute" are :
instagood
photooftheday
cute

方法2:使用正则表达式。

Python3

# import the regex module
import re
 
# function to print all the hashtags in a text
def extract_hashtags(text):
     
    # the regular expression
    regex = "#(\w+)"
     
    # extracting the hashtags
    hashtag_list = re.findall(regex, text)
     
    # printing the hashtag_list
    print("The hashtags in \"" + text + "\" are :")
    for hashtag in hashtag_list:
        print(hashtag)
 
if __name__=="__main__":
    text1 = "GeeksforGeeks is a wonderful # website for # ComputerScience"
    text2 = "This day is beautiful ! # instagood # photooftheday # cute"
    extract_hashtags(text1)
    extract_hashtags(text2)

输出 :

The hashtags in "GeeksforGeeks is a wonderful #website for #ComputerScience" are :
website
ComputerScience
The hashtags in "This day is beautiful! #instagood #photooftheday #cute" are :
instagood
photooftheday
cute