📅  最后修改于: 2023-12-03 15:34:42.760000             🧑  作者: Mango
Requests-HTML 是一个基于 Requests 和 Pyppeteer 的 Python 包,它简化了从网站获取数据的过程。
Requests-HTML 可以通过 pip 安装:
pip install requests-html
发送请求
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://www.python.org')
print(response.html)
Markdown 代码块:
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://www.python.org')
print(response.html)
解析 HTML 页面
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://www.python.org')
title = response.html.find('title', first=True).text
print(title)
Markdown 代码块:
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://www.python.org')
title = response.html.find('title', first=True).text
print(title)
使用无头浏览器执行 JavaScript 代码
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://dynamicwebscraper.com/blog/')
for link in response.html.links:
if '/blog/' in link and '/page/' not in link:
print(link)
response.html.render()
for link in response.html.find('a'):
if '/blog/' in link.attrs['href'] and '/page/' not in link.attrs['href']:
print(link.attrs['href'])
Markdown 代码块:
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://dynamicwebscraper.com/blog/')
for link in response.html.links:
if '/blog/' in link and '/page/' not in link:
print(link)
response.html.render()
for link in response.html.find('a'):
if '/blog/' in link.attrs['href'] and '/page/' not in link.attrs['href']:
print(link.attrs['href'])
Requests-HTML 是一个强大的用于网页爬取的 Python 包,它的特性包括发起 HTTP 请求,解析 HTML 页面,封装了 Pyppeteer,并且兼容 Requests。如果你正在寻找一个易于使用的工具来获取网页数据,那么 Requests-HTML 是你的绝佳选择。