如何使用 BeautifulSoup 抓取嵌套标签？

我们可以借助 . （点）运算符。在创建了一个汤的页面后，如果我们想导航嵌套标签然后借助。我们能做到。要使用 Beautifulsoup 抓取嵌套标签，请按照以下步骤操作。

循序渐进的方法

步骤1：第一步是抓取我们需要导入beautifulsoup模块并获取我们需要导入requests模块的网站的请求。

from bs4 import BeautifulSoup
import requests

第 2 步：第二步是请求 URL 调用 get 方法。

page=requests.get(sample_website)

第 3 步：第三步是使用 beautifulsoup 方法创建汤，并使用 HTML 解析器创建 HTML 解析树。

BeautifulSoup(page.content, 'html.parser')

第 4 步：第四步是执行。运算符，直到我们想要废弃嵌套标签的标签，如果我们想在 body 和 table 中删除标签，那么我们将使用下面的语句来删除嵌套的标签。

soup.body.table.tag

实现

下面是描述如何从特定 URL 中抓取不同嵌套标签的各种示例

示例 1：

Python3

from bs4 import BeautifulSoup
import requests
 
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
 
# call get method to request the page
page = requests.get(sample_website)
 
# with the help of BeautifulSoup method and
# html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
 
# With the help of . operator we will scrap a tag
# under body->ui->i
# here we will go a tag inside body then ul then
# i.means under the body tag we will go to ul tag
# and again inside the ul tag we will go i tag
print(soup.body.ul.i)

Python3

from bs4 import BeautifulSoup
import requests
 
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
 
# call get method to request the page
page = requests.get(sample_website)
 
# with the help of BeautifulSoup method and html
# parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
 
# With the help of . operator we will scrap a tag
# under body->a
# here we will go a tag inside body then a then
# li.means under the body tag we will go to a tag
print(soup.body.a)

Python3

from bs4 import BeautifulSoup
import requests
 
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
 
# call get method to request the page
page = requests.get(sample_website)
 
# with the help of BeautifulSoup method and
# html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
 
#With the help of . operator we will scrap a
# tag under body->a
# here we will go a tag inside body then a then
# li.means under the body tag we will go to a tag 
print(soup.body.a)
 
# With the help of . operator we will scrap a
# tag under body->ui->li
# here we will go a tag inside body then ul then
# li.means under the body tag we will go to ul tag
# and again inside the ul tag we will go li tag
# and inside to li tag we will go to a tag
print(soup.body.ul.li.a)

输出：

示例 2：

蟒蛇3

from bs4 import BeautifulSoup
import requests
 
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
 
# call get method to request the page
page = requests.get(sample_website)
 
# with the help of BeautifulSoup method and html
# parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
 
# With the help of . operator we will scrap a tag
# under body->a
# here we will go a tag inside body then a then
# li.means under the body tag we will go to a tag
print(soup.body.a)

输出：

Skip to content

示例 3：

蟒蛇3

from bs4 import BeautifulSoup
import requests
 
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
 
# call get method to request the page
page = requests.get(sample_website)
 
# with the help of BeautifulSoup method and
# html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
 
#With the help of . operator we will scrap a
# tag under body->a
# here we will go a tag inside body then a then
# li.means under the body tag we will go to a tag 
print(soup.body.a)
 
# With the help of . operator we will scrap a
# tag under body->ui->li
# here we will go a tag inside body then ul then
# li.means under the body tag we will go to ul tag
# and again inside the ul tag we will go li tag
# and inside to li tag we will go to a tag
print(soup.body.ul.li.a)

输出：

Asymptotic Analysis

编程需要懂一点英语