如何使用 BeautifulSoup 抓取嵌套标签?
我们可以借助 . (点)运算符。在创建了一个汤的页面后,如果我们想导航嵌套标签然后借助 。我们能做到。要使用 Beautifulsoup 抓取嵌套标签,请按照以下步骤操作。
循序渐进的方法
步骤1:第一步是抓取我们需要导入beautifulsoup模块并获取我们需要导入requests模块的网站的请求。
from bs4 import BeautifulSoup
import requests
第 2 步:第二步是请求 URL 调用 get 方法。
page=requests.get(sample_website)
第 3 步:第三步是使用 beautifulsoup 方法创建汤,并使用 HTML 解析器创建 HTML 解析树。
BeautifulSoup(page.content, 'html.parser')
第 4 步:第四步是执行。运算符,直到我们想要废弃嵌套标签的标签, 如果我们想在 body 和 table 中删除标签,那么我们将使用下面的语句来删除嵌套的标签。
soup.body.table.tag
实现
下面是描述如何从特定 URL 中抓取不同嵌套标签的各种示例
示例 1:
Python3
from bs4 import BeautifulSoup
import requests
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
# call get method to request the page
page = requests.get(sample_website)
# with the help of BeautifulSoup method and
# html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
# With the help of . operator we will scrap a tag
# under body->ui->i
# here we will go a tag inside body then ul then
# i.means under the body tag we will go to ul tag
# and again inside the ul tag we will go i tag
print(soup.body.ul.i)
Python3
from bs4 import BeautifulSoup
import requests
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
# call get method to request the page
page = requests.get(sample_website)
# with the help of BeautifulSoup method and html
# parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
# With the help of . operator we will scrap a tag
# under body->a
# here we will go a tag inside body then a then
# li.means under the body tag we will go to a tag
print(soup.body.a)
Python3
from bs4 import BeautifulSoup
import requests
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
# call get method to request the page
page = requests.get(sample_website)
# with the help of BeautifulSoup method and
# html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
#With the help of . operator we will scrap a
# tag under body->a
# here we will go a tag inside body then a then
# li.means under the body tag we will go to a tag
print(soup.body.a)
# With the help of . operator we will scrap a
# tag under body->ui->li
# here we will go a tag inside body then ul then
# li.means under the body tag we will go to ul tag
# and again inside the ul tag we will go li tag
# and inside to li tag we will go to a tag
print(soup.body.ul.li.a)
输出:
示例 2:
蟒蛇3
from bs4 import BeautifulSoup
import requests
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
# call get method to request the page
page = requests.get(sample_website)
# with the help of BeautifulSoup method and html
# parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
# With the help of . operator we will scrap a tag
# under body->a
# here we will go a tag inside body then a then
# li.means under the body tag we will go to a tag
print(soup.body.a)
输出:
示例 3:
蟒蛇3
from bs4 import BeautifulSoup
import requests
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
# call get method to request the page
page = requests.get(sample_website)
# with the help of BeautifulSoup method and
# html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
#With the help of . operator we will scrap a
# tag under body->a
# here we will go a tag inside body then a then
# li.means under the body tag we will go to a tag
print(soup.body.a)
# With the help of . operator we will scrap a
# tag under body->ui->li
# here we will go a tag inside body then ul then
# li.means under the body tag we will go to ul tag
# and again inside the ul tag we will go li tag
# and inside to li tag we will go to a tag
print(soup.body.ul.li.a)
输出: