在Python中从 RSS 中提取提要详细信息

在本文中，我们将看到如何使用 RSS 提要为 Hashnode 博客提取提要和发布详细信息。尽管我们将它用于 Hashnode 上的博客，但它也可以用于其他提要。

RSS 是指丰富的站点摘要，使用标准 Web 格式发布经常变化的信息，如博客文章、新闻、音频、视频等。RSS 文档通常被称为提要，它由文本和元数据（如时间和作者姓名）组成。

安装提要解析器：

我们将使用 Feedparser Python库来解析博客的 RSS 提要。它是一个非常流行的用于解析博客提要的库。

pip install feedparser

让我们逐步理解这一点：

第 1 步：获取 RSS 提要

使用 feedparser.parse()函数创建一个包含已解析博客的提要对象。它采用博客提要的 URL。

Python3

# url of blog feed
feed_url = "https://vaibhavkumar.hashnode.dev/rss.xml"
  
blog_feed = feedparser.parse(feed_url)

Python3

# returns title of the blog site
blog_feed.feed.title 
  
# returns the link of the blog
# and number of entries(blogs) in the site.
blog_feed.feed.link
len(blog_feed.entries)
  
# Details of individual blog can
# be accessed by using attribute name
print(blog_feed.entries[0].title)
print(blog_feed.entries[0].link)
print(blog_feed.entries[0].author)
print(blog_feed.entries[0].published)
  
# Getting lists of tags and authors.
tags = [tag.term for tag in blog_feed.entries[0].tags]
authors= [author.name for author in blog_feed.entries[0].authors]

Python3

def get_posts_details(rss=None):
    
    """
    Take link of rss feed as argument
    """
    if rss is not None:
        
          # import the library only when url for feed is passed
        import feedparser
          
        # parsing blog feed
        blog_feed = blog_feed = feedparser.parse(rss)
          
        # getting lists of blog entries via .entries
        posts = blog_feed.entries
          
        # dictionary for holding posts details
        posts_details = {"Blog title" : blog_feed.feed.title,
                        "Blog link" : blog_feed.feed.link}
          
        post_list = []
          
        # iterating over individual posts
        for post in posts:
            temp = dict()
              
            # if any post doesn't have information then throw error.
            try:
                temp["title"] = post.title
                temp["link"] = post.link
                temp["author"] = post.author
                temp["time_published"] = post.published
                temp["tags"] = [tag.term for tag in post.tags]
                temp["authors"] = [author.name for author in post.authors]
                temp["summary"] = post.summary
            except:
                pass
              
            post_list.append(temp)
          
        # storing lists of posts in the dictionary
        posts_details["posts"] = post_list 
          
        return posts_details # returning the details which is dictionary
    else:
        return None
  
if __name__ == "__main__":
  import json
  
  feed_url = "https://vaibhavkumar.hashnode.dev/rss.xml"
  
  data = get_posts_details(rss = feed_url) # return blogs data as a dictionary
    
  if data:
    # printing as a json string with indentation level = 2
    print(json.dumps(data, indent=2)) 
  else:
    print("None")

第 2 步：从博客中获取详细信息。

蟒蛇3

# returns title of the blog site
blog_feed.feed.title 
  
# returns the link of the blog
# and number of entries(blogs) in the site.
blog_feed.feed.link
len(blog_feed.entries)
  
# Details of individual blog can
# be accessed by using attribute name
print(blog_feed.entries[0].title)
print(blog_feed.entries[0].link)
print(blog_feed.entries[0].author)
print(blog_feed.entries[0].published)
  
# Getting lists of tags and authors.
tags = [tag.term for tag in blog_feed.entries[0].tags]
authors= [author.name for author in blog_feed.entries[0].authors]

下面是完整的实现：现在使用上面的代码编写一个函数，该函数获取 RSS 提要的链接并返回详细信息。

蟒蛇3

def get_posts_details(rss=None):
    
    """
    Take link of rss feed as argument
    """
    if rss is not None:
        
          # import the library only when url for feed is passed
        import feedparser
          
        # parsing blog feed
        blog_feed = blog_feed = feedparser.parse(rss)
          
        # getting lists of blog entries via .entries
        posts = blog_feed.entries
          
        # dictionary for holding posts details
        posts_details = {"Blog title" : blog_feed.feed.title,
                        "Blog link" : blog_feed.feed.link}
          
        post_list = []
          
        # iterating over individual posts
        for post in posts:
            temp = dict()
              
            # if any post doesn't have information then throw error.
            try:
                temp["title"] = post.title
                temp["link"] = post.link
                temp["author"] = post.author
                temp["time_published"] = post.published
                temp["tags"] = [tag.term for tag in post.tags]
                temp["authors"] = [author.name for author in post.authors]
                temp["summary"] = post.summary
            except:
                pass
              
            post_list.append(temp)
          
        # storing lists of posts in the dictionary
        posts_details["posts"] = post_list 
          
        return posts_details # returning the details which is dictionary
    else:
        return None
  
if __name__ == "__main__":
  import json
  
  feed_url = "https://vaibhavkumar.hashnode.dev/rss.xml"
  
  data = get_posts_details(rss = feed_url) # return blogs data as a dictionary
    
  if data:
    # printing as a json string with indentation level = 2
    print(json.dumps(data, indent=2)) 
  else:
    print("None")

输出：