📅  最后修改于: 2023-12-03 15:09:14.646000             🧑  作者: Mango
当爬取网页时,经常需要获取某个元素下的子元素来抓取数据。使用Python中的beautifulsoup库可以方便地实现这个功能。
pip install beautifulsoup4
from bs4 import BeautifulSoup
import requests
url = 'https://www.example.com/page'
r = requests.get(url)
html = r.text
soup = BeautifulSoup(html, 'html.parser')
parent = soup.find('ul', {'id': 'parent'})
children = parent.find_all('li')
for child in children:
print(child.text)
from bs4 import BeautifulSoup
import requests
url = 'https://www.example.com/page'
r = requests.get(url)
html = r.text
soup = BeautifulSoup(html, 'html.parser')
parent = soup.find('ul', {'id': 'parent'})
children = parent.find_all('li')
for child in children:
print(child.text)
通过使用beautifulsoup库,我们可以轻松地获取父元素下的子元素,进而实现网页数据的抓取。同时,beautifulsoup库还提供了很多其他的实用方法,可以满足各种爬虫需求。