📌  相关文章
📜  BeautifulSoup - 查找元素的所有子元素

📅  最后修改于: 2022-05-13 01:55:26.813000             🧑  作者: Mango

BeautifulSoup - 查找元素的所有子元素

您可能已经看到有各种复杂而冗长的网站,从中搜索任何内容都变得困难。为了方便我们搜索、修改和迭代的工作, Python为我们提供了一些内置的库,例如 Requests、Xml、Beautiful Soup、 Selenium、Scrapy 等。在这些可用的库中,Beautiful Soup 是比较能进行网页抓取的比Python其他可用的更快。有时,会出现这样的情况,当我们需要在 Beautiful Soup 的帮助下找到一个元素的所有子元素。如果你不知道,如何找到这些。别担心!在本文中,我们将讨论查找元素子元素的过程。

以下是供考虑的 HTML 文件:



HTML


 
  My First Heading
 
 
  

   Vinayak Rai   

    
    Fruits   
  • Apple
  •   
  • Banana
  •   
  • Mango
  •  
 


Python3
from bs4 import BeautifulSoup as bs
import os


Python3
base=os.path.dirname(os.path.abspath(#Name of your Python file))


Python3
html=open(os.path.join(base, '#Name of HTML file'))


Python3
soup=bs(html, 'html.parser')


Python3
unordered_list=soup.find("#Widget Name", 
      {"id":"#Id name of element of which you want to find children "})


Python3
children = unordered_list.findChildren()


Python3
for child in children:
    print (child)


Python
# Python program to find all the children
# of an element using Beautiful Soup
  
# Import the libraries BeautifulSoup and os
from bs4 import BeautifulSoup as bs
import os
  
# Remove the last segment of the path 
# Give same name in abspath as given to Python file
base = os.path.dirname(os.path.abspath('run.py'))
  
# Open the HTML in which you want to make changes
html = open(os.path.join(base, 'gfg.html'))
  
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
  
# Give location where text is stored which you wish to alter
unordered_list = soup.find("ul", {"id": "list"})
  
# Find children of an element
children = unordered_list.findChildren()
  
# Print all children of an element
for child in children:
    print(child)


分步实施:

步骤 1:首先,导入库 Beautiful Soup 和 os.

蟒蛇3

from bs4 import BeautifulSoup as bs
import os

第 2 步:现在,通过为 abspath 指定与Python文件相同的名称来删除路径的最后一段。

蟒蛇3

base=os.path.dirname(os.path.abspath(#Name of your Python file))

第 3 步:然后,打开您要打开的 HTML 文件。

蟒蛇3

html=open(os.path.join(base, '#Name of HTML file'))

第 4 步:在 Beautiful Soup 中解析 HTML。



蟒蛇3

soup=bs(html, 'html.parser')

步骤 5:此外,给出要为其查找子元素的元素的位置

蟒蛇3

unordered_list=soup.find("#Widget Name", 
      {"id":"#Id name of element of which you want to find children "})

第 6 步:接下来,找到一个元素的所有子元素。

蟒蛇3

children = unordered_list.findChildren()

第 7 步:最后,打印您在上一步中找到的元素的所有子元素。

蟒蛇3

for child in children:
    print (child)

下面是完整的实现:

Python

# Python program to find all the children
# of an element using Beautiful Soup
  
# Import the libraries BeautifulSoup and os
from bs4 import BeautifulSoup as bs
import os
  
# Remove the last segment of the path 
# Give same name in abspath as given to Python file
base = os.path.dirname(os.path.abspath('run.py'))
  
# Open the HTML in which you want to make changes
html = open(os.path.join(base, 'gfg.html'))
  
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
  
# Give location where text is stored which you wish to alter
unordered_list = soup.find("ul", {"id": "list"})
  
# Find children of an element
children = unordered_list.findChildren()
  
# Print all children of an element
for child in children:
    print(child)

输出: