BeautifulSoup - 查找元素的所有子元素

您可能已经看到有各种复杂而冗长的网站，从中搜索任何内容都变得困难。为了方便我们搜索、修改和迭代的工作， Python为我们提供了一些内置的库，例如 Requests、Xml、Beautiful Soup、 Selenium、Scrapy 等。在这些可用的库中，Beautiful Soup 是比较能进行网页抓取的比Python其他可用的更快。有时，会出现这样的情况，当我们需要在 Beautiful Soup 的帮助下找到一个元素的所有子元素。如果你不知道，如何找到这些。别担心！在本文中，我们将讨论查找元素子元素的过程。

Syntax:

unordered_list=soup.find(“#Widget Name”, {“id”:”#Id name of element of which you want to find children “})

children = unordered_list.findChildren()

编程需要懂一点英语

以下是供考虑的 HTML 文件：

HTML



 
  My First Heading
 
 
  
   Vinayak Rai
  
  
 Fruits
  Apple
  Banana
  Mango

Python3

from bs4 import BeautifulSoup as bs
import os

Python3

base=os.path.dirname(os.path.abspath(#Name of your Python file))

Python3

html=open(os.path.join(base, '#Name of HTML file'))

Python3

soup=bs(html, 'html.parser')

Python3

unordered_list=soup.find("#Widget Name", 
      {"id":"#Id name of element of which you want to find children "})

Python3

children = unordered_list.findChildren()

Python3

for child in children:
    print (child)

Python

# Python program to find all the children
# of an element using Beautiful Soup
  
# Import the libraries BeautifulSoup and os
from bs4 import BeautifulSoup as bs
import os
  
# Remove the last segment of the path 
# Give same name in abspath as given to Python file
base = os.path.dirname(os.path.abspath('run.py'))
  
# Open the HTML in which you want to make changes
html = open(os.path.join(base, 'gfg.html'))
  
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
  
# Give location where text is stored which you wish to alter
unordered_list = soup.find("ul", {"id": "list"})
  
# Find children of an element
children = unordered_list.findChildren()
  
# Print all children of an element
for child in children:
    print(child)

分步实施：

步骤 1：首先，导入库 Beautiful Soup 和 os.

蟒蛇3

from bs4 import BeautifulSoup as bs
import os

第 2 步：现在，通过为 abspath 指定与Python文件相同的名称来删除路径的最后一段。

蟒蛇3

base=os.path.dirname(os.path.abspath(#Name of your Python file))

第 3 步：然后，打开您要打开的 HTML 文件。

蟒蛇3

html=open(os.path.join(base, '#Name of HTML file'))

第 4 步：在 Beautiful Soup 中解析 HTML。

蟒蛇3

soup=bs(html, 'html.parser')

步骤 5：此外，给出要为其查找子元素的元素的位置

蟒蛇3

unordered_list=soup.find("#Widget Name", 
      {"id":"#Id name of element of which you want to find children "})

第 6 步：接下来，找到一个元素的所有子元素。

蟒蛇3

children = unordered_list.findChildren()

第 7 步：最后，打印您在上一步中找到的元素的所有子元素。

蟒蛇3

for child in children:
    print (child)

下面是完整的实现：

Python

# Python program to find all the children
# of an element using Beautiful Soup
  
# Import the libraries BeautifulSoup and os
from bs4 import BeautifulSoup as bs
import os
  
# Remove the last segment of the path 
# Give same name in abspath as given to Python file
base = os.path.dirname(os.path.abspath('run.py'))
  
# Open the HTML in which you want to make changes
html = open(os.path.join(base, 'gfg.html'))
  
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
  
# Give location where text is stored which you wish to alter
unordered_list = soup.find("ul", {"id": "list"})
  
# Find children of an element
children = unordered_list.findChildren()
  
# Print all children of an element
for child in children:
    print(child)

输出：