使用 BeautifulSoup 获取所有标题标签的列表

为了使用 BeautifulSoup 打印所有标题标签，我们使用find_all()方法。 find_all 方法是 BeautifulSoup 中最常用的方法之一。它查看一个标签并检索该标签的所有出现。

Syntax: find_all(name, attrs, recursive, string, limit, **kwargs)

HTML 文档由以下标签组成——h1、h2、h3、h4、h5 和 h6。网页中最常用的 HTML 标签是 h1、h2 和 h3，为了找到这些标签，我们将标签列表作为参数传递给 find_all() 方法。

脚步：

导入库请求和 BeautifulSoup
将 URL 传递给变量
使用请求库获取 URL
创建一个 BeautifulSoup 对象
创建标题标签列表 ()
使用 find_all()方法遍历所有标题标签

例子：

Python3

# Python program to print all heading tags
import requests
from bs4 import BeautifulSoup
 
# scraping a wikipedia article
url_link = 'https://www.geeksforgeeks.org/how-to-scrape-all-pdf-files-in-a-website/'
request = requests.get(url_link)
 
Soup = BeautifulSoup(request.text, 'lxml')
 
# creating a list of all common heading tags
heading_tags = ["h1", "h2", "h3"]
for tags in Soup.find_all(heading_tags):
    print(tags.name + ' -> ' + tags.text.strip())

输出：

h2 -> Related Articles
h2 -> Python3
h2 -> Python3
h2 -> Python3
h2 -> Python3
h2 -> Python3
h2 -> Python3