📌  相关文章
📜  使用 BeautifulSoup 检索 html 标签的孩子

📅  最后修改于: 2022-05-13 01:55:35.506000             🧑  作者: Mango

使用 BeautifulSoup 检索 html 标签的孩子

先决条件: Beautifulsoup

Beautifulsoup 是一个用于网页抓取的Python模块。本文讨论如何抓取和显示给定 HTML 标签的子标签。

示例网站: https : //www.geeksforgeeks.org/caching-page-tables/

对于第一个孩子:

方法

  • 导入模块
  • 传递网址
  • 请求页面
  • 使用 findChild()函数显示第一个孩子

句法:



例子:

Python3
from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
  
child_soup = soup.find('p')
  
print("child :  ", child_soup.findChild())


Python3
from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('p')
  
for i in child_soup.children:
    print("child :  ", i)


Python3
from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
  
child_soup = soup.find('p')
  
print("child :  ", child_soup.contents)


输出:

对于所有孩子:

对于检索 HTML 标记的子项,我们必须选择使用.children.contents 。孩子和内容之间的区别是孩子不占用任何内存,它为我们提供了一个可迭代列表,内容给了孩子标签,但它使用了内存。对于大型 HTML 文件,使用 children 是更好的选择,并且对于存储价值需要的内容会更好。

方法

  • 导入模块
  • 传递网站网址
  • 请求页面
  • 使用任一关键字显示子标签

使用 .children:



For Retrieve all the children .children 将使用。

例子:

蟒蛇3

from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('p')
  
for i in child_soup.children:
    print("child :  ", i)

输出:

使用 .contents

它还将返回所有子标签并将它们存储在内存中。

例子

蟒蛇3

from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
  
child_soup = soup.find('p')
  
print("child :  ", child_soup.contents)

输出