如何在Python中使用 BeautifulSoup 删除空标签？

先决条件：请求，BeautifulSoup，strip

任务是编写一个程序，从 HTML 代码中删除空标记。在 Beautiful Soup 中，没有内置方法可以删除没有内容的标签。

所需模块：

bs4: Beautiful Soup(bs4) 是一个Python库，用于从 HTML 和 XML 文件中提取数据。这个模块没有内置于Python中。要安装此类型，请在终端中输入以下命令。

pip install bs4

要求： Requests 允许您非常轻松地发送 HTTP/1.1 请求。这个模块也没有内置于Python中。要安装此类型，请在终端中输入以下命令。

pip install requests

方法：

获取 HTML 代码
遍历每个标签
- 从标签中获取文本并使用条带删除空格。
- 删除空格后，检查如果文本的长度为零，则从HTML代码中删除标记。

示例 1：删除空标签。

Python3

# Import Module
from bs4 import BeautifulSoup
  
# HTML Object
html_object = """
  


some
text
here
  
"""
  
# Get HTML Code
soup = BeautifulSoup( html_object , "lxml")
  
# Iterate each line
for x in soup.find_all():
  
    # fetching text from tag and remove whitespaces
    if len(x.get_text(strip=True)) == 0:
          
        # Remove empty tag
        x.extract()
  
# Print HTML Code with removed empty tags
print(soup)

Python3

# Import Module
from bs4 import BeautifulSoup
import requests
  
# Page URL
URL = "https://www.geeksforgeeks.org/"
  
# Page content from Website URL
page = requests.get( URL )
  
# Get HTML Code
soup = BeautifulSoup( page.content , "lxml" )
  
# Iterate each line
for x in soup.find_all():
  
    # fetching text from tag and remove whitespaces
    if len( x.get_text ( strip = True )) == 0:
  
        # Remove empty tag
        x.extract()
  
# Print HTML Code with removed empty tags
print(soup)

输出：

sometexthere

示例 2：从给定 URL 中删除空标签。

蟒蛇3

# Import Module
from bs4 import BeautifulSoup
import requests
  
# Page URL
URL = "https://www.geeksforgeeks.org/"
  
# Page content from Website URL
page = requests.get( URL )
  
# Get HTML Code
soup = BeautifulSoup( page.content , "lxml" )
  
# Iterate each line
for x in soup.find_all():
  
    # fetching text from tag and remove whitespaces
    if len( x.get_text ( strip = True )) == 0:
  
        # Remove empty tag
        x.extract()
  
# Print HTML Code with removed empty tags
print(soup)

输出：