BeautifulSoup – 移除标签内容
在本文中,我们将看到如何使用 BeautifulSoup 从 HTML 中删除内容标签。 BeautifulSoup 是一个用于提取 html 和 xml 文件的Python库。
需要的模块:
美汤: 我们的主要模块包含一种通过 HTTP 访问网页的方法。
要安装,请在终端中运行此命令:
pip install bs4
方法:
- 首先,我们将导入所需的库。
- 我们将读取 html 文件或文本。
- 我们将提取的文本提供给汤对象。
- 然后我们将找到所需的标签,然后清除其元素。
分步实施:
第 1 步:我们将初始化程序,导入库并读取或创建我们想要的 HTML 文档。
Python3
# Importing libraries
from bs4 import BeautifulSoup
# Reading the html text we want to parse
text = " Welcome This is a test page
"
Python3
# creating a soup
soup = BeautifulSoup(text,"html.parser")
# printing the content in h1 tag
print(f"Content of h1 tag is: {soup.h1}")
Python3
# clearing the content of the tag
soup.h1.clear()
# printing the content in h1 tag after clearing
print(f"Content of h1 tag after clearing: {soup.h1}")
Python3
# Importing libraries
from bs4 import BeautifulSoup
# Reading the html text we want to parse
text = " Welcome This is a test page
"
# creating a soup
soup = BeautifulSoup(text,"html.parser")
# printing the content in h1 tag
print(f"Content of h1 tag is: {soup.h1}")
# clearing the content of the tag
soup.h1.clear()
# printing the content in h1 tag after clearing
print(f"Content of h1 tag after clearing: {soup.h1}")
第 2 步:我们将检索到的文本传递给汤对象并设置解析器,在这种情况下我们使用的是 html 解析器。可以使用的其他标记是 xml 或 html5。然后我们将提到我们必须从中删除内容的标签。
蟒蛇3
# creating a soup
soup = BeautifulSoup(text,"html.parser")
# printing the content in h1 tag
print(f"Content of h1 tag is: {soup.h1}")
输出:
第 3 步:我们将使用 .clear函数。它清除提到的标签的内容。
蟒蛇3
# clearing the content of the tag
soup.h1.clear()
# printing the content in h1 tag after clearing
print(f"Content of h1 tag after clearing: {soup.h1}")
下面是完整的实现:
蟒蛇3
# Importing libraries
from bs4 import BeautifulSoup
# Reading the html text we want to parse
text = " Welcome This is a test page
"
# creating a soup
soup = BeautifulSoup(text,"html.parser")
# printing the content in h1 tag
print(f"Content of h1 tag is: {soup.h1}")
# clearing the content of the tag
soup.h1.clear()
# printing the content in h1 tag after clearing
print(f"Content of h1 tag after clearing: {soup.h1}")