BeautifulSoup - 查找元素的所有子元素
您可能已经看到有各种复杂而冗长的网站,从中搜索任何内容都变得困难。为了方便我们搜索、修改和迭代的工作, Python为我们提供了一些内置的库,例如 Requests、Xml、Beautiful Soup、 Selenium、Scrapy 等。在这些可用的库中,Beautiful Soup 是比较能进行网页抓取的比Python其他可用的更快。有时,会出现这样的情况,当我们需要在 Beautiful Soup 的帮助下找到一个元素的所有子元素。如果你不知道,如何找到这些。别担心!在本文中,我们将讨论查找元素子元素的过程。
Syntax:
unordered_list=soup.find(“#Widget Name”, {“id”:”#Id name of element of which you want to find children “})
children = unordered_list.findChildren()
以下是供考虑的 HTML 文件:
HTML
My First Heading
Vinayak Rai
Fruits
- Apple
- Banana
- Mango
Python3
from bs4 import BeautifulSoup as bs
import os
Python3
base=os.path.dirname(os.path.abspath(#Name of your Python file))
Python3
html=open(os.path.join(base, '#Name of HTML file'))
Python3
soup=bs(html, 'html.parser')
Python3
unordered_list=soup.find("#Widget Name",
{"id":"#Id name of element of which you want to find children "})
Python3
children = unordered_list.findChildren()
Python3
for child in children:
print (child)
Python
# Python program to find all the children
# of an element using Beautiful Soup
# Import the libraries BeautifulSoup and os
from bs4 import BeautifulSoup as bs
import os
# Remove the last segment of the path
# Give same name in abspath as given to Python file
base = os.path.dirname(os.path.abspath('run.py'))
# Open the HTML in which you want to make changes
html = open(os.path.join(base, 'gfg.html'))
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
# Give location where text is stored which you wish to alter
unordered_list = soup.find("ul", {"id": "list"})
# Find children of an element
children = unordered_list.findChildren()
# Print all children of an element
for child in children:
print(child)
分步实施:
步骤 1:首先,导入库 Beautiful Soup 和 os.
蟒蛇3
from bs4 import BeautifulSoup as bs
import os
第 2 步:现在,通过为 abspath 指定与Python文件相同的名称来删除路径的最后一段。
蟒蛇3
base=os.path.dirname(os.path.abspath(#Name of your Python file))
第 3 步:然后,打开您要打开的 HTML 文件。
蟒蛇3
html=open(os.path.join(base, '#Name of HTML file'))
第 4 步:在 Beautiful Soup 中解析 HTML。
蟒蛇3
soup=bs(html, 'html.parser')
步骤 5:此外,给出要为其查找子元素的元素的位置
蟒蛇3
unordered_list=soup.find("#Widget Name",
{"id":"#Id name of element of which you want to find children "})
第 6 步:接下来,找到一个元素的所有子元素。
蟒蛇3
children = unordered_list.findChildren()
第 7 步:最后,打印您在上一步中找到的元素的所有子元素。
蟒蛇3
for child in children:
print (child)
下面是完整的实现:
Python
# Python program to find all the children
# of an element using Beautiful Soup
# Import the libraries BeautifulSoup and os
from bs4 import BeautifulSoup as bs
import os
# Remove the last segment of the path
# Give same name in abspath as given to Python file
base = os.path.dirname(os.path.abspath('run.py'))
# Open the HTML in which you want to make changes
html = open(os.path.join(base, 'gfg.html'))
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
# Give location where text is stored which you wish to alter
unordered_list = soup.find("ul", {"id": "list"})
# Find children of an element
children = unordered_list.findChildren()
# Print all children of an element
for child in children:
print(child)
输出: