如何处理 BeautifulSoup 中的重复属性?
有时在获取信息时,您是否在处理从相同标签的重复属性接收到的信息时遇到任何问题?如果是,那么请阅读文章并清除您的所有疑虑。
创建用于存储项目的列表后,请编写以下代码。
句法:
list=soup.find_all(“#Widget Name”, {“id”:”#Id name of widget in which you want to edit”})
编写以下代码后,从输出中删除属性并从列表中打印您想要的特定项目。
方法:
- 导入模块
- 现在,通过输入您当前在其中工作的Python文件的名称来删除路径的最后一段。
句法:
base=os.path.dirname(os.path.abspath(‘#Name of Python file in which you are currently working’))
- 然后,打开要从中读取值的 HTML 文件。
句法:
html=open(os.path.join(base, ‘#Name of HTML file from which you wish to read value’))
- 在 BeautifulSoup 中解析 HTML 文件。
- 此外,创建一个列表来存储相同标签和属性的所有项目值。
- 接下来,找到所有具有相同标签和属性的项目。
句法:
list=soup.find_all(“#Widget Name”, {“id”:”#Id name of widget in which you want to edit”})
- 稍后,从标签中删除所有属性。
- 最后,打印小部件标签的特定项目。
使用中的网页:
HTML
Geeks For Geeks
King
Prince
Queen
Princess
Python
# Import the libraries beautifulsoup and os
from bs4 import BeautifulSoup as bs
import os
# Remove the last segment of the path
# Here replace the name of your python file with
# gfg4.py
base = os.path.dirname(os.path.abspath("gfg4.py"))
# Open the HTML in which you want to make
# changes
html = open(os.path.join(base, 'gfg.html'))
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
# Create a list to store the items
list = [3]
# Finding all the elements inside div
# with paragraph having id: vinayak
list = soup.div.find_all("p", {"id": "vinayak"})
# Removing attributes from the output
for i in list:
i.attrs = {}
# Printing the value Prince
print(list[1])
# Printing the value Queen
print(list[2])
程序:
Python
# Import the libraries beautifulsoup and os
from bs4 import BeautifulSoup as bs
import os
# Remove the last segment of the path
# Here replace the name of your python file with
# gfg4.py
base = os.path.dirname(os.path.abspath("gfg4.py"))
# Open the HTML in which you want to make
# changes
html = open(os.path.join(base, 'gfg.html'))
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
# Create a list to store the items
list = [3]
# Finding all the elements inside div
# with paragraph having id: vinayak
list = soup.div.find_all("p", {"id": "vinayak"})
# Removing attributes from the output
for i in list:
i.attrs = {}
# Printing the value Prince
print(list[1])
# Printing the value Queen
print(list[2])
输出:
Prince
Queen