如何在Python中使用 BeautifulSoup 删除标签?
先决条件- Beautifulsoup 模块
在本文中,我们将起草一个Python脚本,该脚本从树中删除标签,然后完全销毁它及其内容。为此,使用了模块中内置的分解()方法。
句法:
Beautifulsoup.Tag.decompose()
Tag.decompose() 从给定 HTML 文档的树中删除一个标签,然后完全销毁它及其内容。
执行:
示例 1:
Python3
# import module
from bs4 import BeautifulSoup
# URL for scrapping data
markup = 'Welcome to geeksforgeeks.com'
# get URL html
soup = BeautifulSoup(markup, 'html.parser')
# display before decompose
print("Before Decompose")
print(soup.a)
# decomposing the
# soup data
new_tag = soup.a.decompose()
print("After decomposing:")
print(new_tag)
Python3
# import module
from bs4 import BeautifulSoup
import requests
# Get URL html
# Scraping the data from
# Html doc
url = 'https://www.geeksforgeeks.org/'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')
# Before decomposing
print("Before Decomposing")
print(soup)
# decompose the soup
result = soup.decompose()
print("After decomposing:")
print(result)
输出:
示例 2:实现给定 URL 以抓取 HTML 文档。
蟒蛇3
# import module
from bs4 import BeautifulSoup
import requests
# Get URL html
# Scraping the data from
# Html doc
url = 'https://www.geeksforgeeks.org/'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')
# Before decomposing
print("Before Decomposing")
print(soup)
# decompose the soup
result = soup.decompose()
print("After decomposing:")
print(result)
输出:
Before Decomposing
..
……
After decomposing:
None