BeautifulSoup – 从 HTML 中抓取列表

先决条件：

要求
美汤

Python可用于从网页中抓取信息。它还可以用于检索特定标签中提供的数据，本文介绍了如何从 HTML 中抓取列表元素。

所需模块：

bs4: Beautiful Soup(bs4) 是一个Python库，用于从 HTML 和 XML 文件中提取数据。这个模块没有内置于Python中。要安装此类型，请在终端中输入以下命令。

pip install bs4

请求：请求允许您非常轻松地发送 HTTP/1.1 请求。这个模块也没有内置于Python中。要安装此类型，请在终端中输入以下命令。

pip install requests

方法：

导入模块
使用请求获取 HTML 代码模块
使用find_all()方法查找所有列表标签。
遍历所有列表标签并使用文本获取文本财产

示例 1：从 HTML 代码中抓取列表

Python3

# Import Required Moduels
from bs4 import BeautifulSoup
import requests
  
# HTML Code
html_content = """

  Coffee
  Tea
  Milk

"""
  
# Parse the html content
soup = BeautifulSoup(html_content, "lxml")
  
# Find all li tag
datas = soup.find_all("li")
  
# Iterate through all li tags
for data in datas:
    # Get text from each tag
    print(data.text)
  
print(f"Total {len(datas)} li tag found")

Python3

# Import Required Moduels
from bs4 import BeautifulSoup
import requests
  
# Web URL
url = "https://www.geeksforgeeks.org/python-list/"
  
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
  
# Parse the html content
soup = BeautifulSoup(html_content, "lxml")
  
# Find all li tag
datas = soup.find_all("li")
  
# Iterate through all li tags
for data in datas:
    # Get text from each tag
    print(data.text)
  
print(f"Total {len(datas)} li tag found")

输出：

Coffee

Tea

Milk

Total 3 li tag found

编程需要懂一点英语

示例 2：从 Web URL 抓取列表

蟒蛇3

# Import Required Moduels
from bs4 import BeautifulSoup
import requests
  
# Web URL
url = "https://www.geeksforgeeks.org/python-list/"
  
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
  
# Parse the html content
soup = BeautifulSoup(html_content, "lxml")
  
# Find all li tag
datas = soup.find_all("li")
  
# Iterate through all li tags
for data in datas:
    # Get text from each tag
    print(data.text)
  
print(f"Total {len(datas)} li tag found")

输出：