BeautifulSoup – 从 HTML 中抓取列表
先决条件:
- 要求
- 美汤
Python可用于从网页中抓取信息。它还可以用于检索特定标签中提供的数据,本文介绍了如何从 HTML 中抓取列表元素。
所需模块:
- bs4: Beautiful Soup(bs4) 是一个Python库,用于从 HTML 和 XML 文件中提取数据。这个模块没有内置于Python中。要安装此类型,请在终端中输入以下命令。
pip install bs4
- 请求:请求允许您非常轻松地发送 HTTP/1.1 请求。这个模块也没有内置于Python中。要安装此类型,请在终端中输入以下命令。
pip install requests
方法:
- 导入模块
- 使用请求获取 HTML 代码 模块
- 使用find_all()方法查找所有列表标签。
- 遍历所有列表标签并使用文本获取文本 财产
示例 1:从 HTML 代码中抓取列表
Python3
# Import Required Moduels
from bs4 import BeautifulSoup
import requests
# HTML Code
html_content = """
- Coffee
- Tea
- Milk
"""
# Parse the html content
soup = BeautifulSoup(html_content, "lxml")
# Find all li tag
datas = soup.find_all("li")
# Iterate through all li tags
for data in datas:
# Get text from each tag
print(data.text)
print(f"Total {len(datas)} li tag found")
Python3
# Import Required Moduels
from bs4 import BeautifulSoup
import requests
# Web URL
url = "https://www.geeksforgeeks.org/python-list/"
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
# Parse the html content
soup = BeautifulSoup(html_content, "lxml")
# Find all li tag
datas = soup.find_all("li")
# Iterate through all li tags
for data in datas:
# Get text from each tag
print(data.text)
print(f"Total {len(datas)} li tag found")
输出:
Coffee
Tea
Milk
Total 3 li tag found
示例 2:从 Web URL 抓取列表
蟒蛇3
# Import Required Moduels
from bs4 import BeautifulSoup
import requests
# Web URL
url = "https://www.geeksforgeeks.org/python-list/"
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
# Parse the html content
soup = BeautifulSoup(html_content, "lxml")
# Find all li tag
datas = soup.find_all("li")
# Iterate through all li tags
for data in datas:
# Get text from each tag
print(data.text)
print(f"Total {len(datas)} li tag found")
输出: