使用Python提取包含特定主题标签的推文

Twitter 是最受欢迎的社交媒体平台之一。 Twitter API 提供了你需要的工具来参与、参与和分析 Twitter 上发生的对话，它在数据分析和人工智能等领域有很多应用。本文重点介绍如何从给定日期开始提取具有特定 Hashtag 的推文。

要求：

Tweepy是一个Python包，用于轻松访问 Twitter API。 Twitter API 提供的几乎所有功能都可以通过Tweepy使用。要安装此类型，请在终端中输入以下命令。

pip install Tweepy

Pandas是一个非常强大的Python数据分析框架。它建立在Numpy Package 之上，其关键数据结构是一个DataFrame ，可以在其中操作表格数据。要安装此类型，请在终端中输入以下命令。

pip install pandas

先决条件：

创建一个 Twitter 开发者帐户并获取您的消费者密钥和访问令牌
通过在命令提示符中运行此命令在您的系统上安装Tweepy和Pandas模块

循序渐进的方法：

导入所需的模块。
创建一个显式函数来显示推文数据。
创建另一个函数来使用tweepy模块抓取有关给定 Hashtag 的数据。
在驱动程序代码中，分配 Twitter 开发人员帐户凭据以及标签、初始日期和推文数量。
最后，调用函数以 Hashtag、初始日期和推文数量作为参数来抓取数据。

以下是基于上述方法的完整程序：

Python

# Python Script to Extract tweets of a 
# particular Hashtag using Tweepy and Pandas
  
  
# import modules
import pandas as pd
import tweepy
  
  
# function to display data of each tweet
def printtweetdata(n, ith_tweet):
    print()
    print(f"Tweet {n}:")
    print(f"Username:{ith_tweet[0]}")
    print(f"Description:{ith_tweet[1]}")
    print(f"Location:{ith_tweet[2]}")
    print(f"Following Count:{ith_tweet[3]}")
    print(f"Follower Count:{ith_tweet[4]}")
    print(f"Total Tweets:{ith_tweet[5]}")
    print(f"Retweet Count:{ith_tweet[6]}")
    print(f"Tweet Text:{ith_tweet[7]}")
    print(f"Hashtags Used:{ith_tweet[8]}")
  
  
# function to perform data extraction
def scrape(words, date_since, numtweet):
      
    # Creating DataFrame using pandas
    db = pd.DataFrame(columns=['username', 'description', 'location', 'following',
                               'followers', 'totaltweets', 'retweetcount', 'text', 'hashtags'])
      
    # We are using .Cursor() to search through twitter for the required tweets.
    # The number of tweets can be restricted using .items(number of tweets)
    tweets = tweepy.Cursor(api.search, q=words, lang="en",
                           since=date_since, tweet_mode='extended').items(numtweet)
     
    # .Cursor() returns an iterable object. Each item in 
    # the iterator has various attributes that you can access to 
    # get information about each tweet
    list_tweets = [tweet for tweet in tweets]
      
    # Counter to maintain Tweet Count
    i = 1  
      
    # we will iterate over each tweet in the list for extracting information about each tweet
    for tweet in list_tweets:
        username = tweet.user.screen_name
        description = tweet.user.description
        location = tweet.user.location
        following = tweet.user.friends_count
        followers = tweet.user.followers_count
        totaltweets = tweet.user.statuses_count
        retweetcount = tweet.retweet_count
        hashtags = tweet.entities['hashtags']
          
        # Retweets can be distinguished by a retweeted_status attribute,
        # in case it is an invalid reference, except block will be executed
        try:
            text = tweet.retweeted_status.full_text
        except AttributeError:
            text = tweet.full_text
        hashtext = list()
        for j in range(0, len(hashtags)):
            hashtext.append(hashtags[j]['text'])
          
        # Here we are appending all the extracted information in the DataFrame
        ith_tweet = [username, description, location, following,
                     followers, totaltweets, retweetcount, text, hashtext]
        db.loc[len(db)] = ith_tweet
          
        # Function call to print tweet data on screen
        printtweetdata(i, ith_tweet)
        i = i+1
    filename = 'scraped_tweets.csv'
      
    # we will save our database as a CSV file.
    db.to_csv(filename)
  
  
if __name__ == '__main__':
      
    # Enter your own credentials obtained 
    # from your developer account
    consumer_key = "XXXXXXXXXXXXXXXXXXXXX"
    consumer_secret = "XXXXXXXXXXXXXXXXXXXXX"
    access_key = "XXXXXXXXXXXXXXXXXXXXX"
    access_secret = "XXXXXXXXXXXXXXXXXXXXX"
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)
      
    # Enter Hashtag and initial date
    print("Enter Twitter HashTag to search for")
    words = input()
    print("Enter Date since The Tweets are required in yyyy-mm--dd")
    date_since = input()
      
    # number of tweets you want to extract in one run
    numtweet = 100  
    scrape(words, date_since, numtweet)
    print('Scraping has completed!')

输出：

演示：