使用Python和 BS4 抓取天气预报数据
本文围绕使用Python和 bs4 库报废天气预报数据展开。让我们检查脚本中使用的组件 -
BeautifulSoup– It is a powerful Python library for pulling out data from HTML/XML files. It creates a parse tree for parsed pages that can be used to extract data from HTML/XML files.
Requests – It is a Python HTTP library. It makes HTTP requests simpler. we just need to add the URL as an argument and the get() gets all the information from it.
第 1 步 –运行以下命令以将存储的内容从 URL 获取到响应对象(文件)中:
import requests
# to get data from website
file = requests.get("https://weather.com/en-IN/weather/tenday/l/INKA0344:1:IN")
第 2 步 -解析 HTML 内容:
# import Beautifulsoup for scraping the data
from bs4 import BeautifulSoup
soup = BeautifulSoup(file.content, "html.parser")
第 3 步 –从天气站点抓取数据运行以下代码:
# create empty list
list =[]
all = soup.find("div", {"class":"locations-title ten-day-page-title"}).find("h1").text
# find all table with class-"twc-table"
content = soup.find_all("table", {"class":"twc-table"})
for items in content:
for i in range(len(items.find_all("tr"))-1):
# create empty dictionary
dict = {}
try:
# assign value to given key
dict["day"]= items.find_all("span", {"class":"date-time"})[i].text
dict["date"]= items.find_all("span", {"class":"day-detail"})[i].text
dict["desc"]= items.find_all("td", {"class":"description"})[i].text
dict["temp"]= items.find_all("td", {"class":"temp"})[i].text
dict["precip"]= items.find_all("td", {"class":"precip"})[i].text
dict["wind"]= items.find_all("td", {"class":"wind"})[i].text
dict["humidity"]= items.find_all("td", {"class":"humidity"})[i].text
except:
# assign None values if no items are there with specified class
dict["day"]="None"
dict["date"]="None"
dict["desc"]="None"
dict["temp"]="None"
dict["precip"]="None"
dict["wind"]="None"
dict["humidity"]="None"
# append dictionary values to the list
list.append(dict)
find_all: It is used to pick up all the HTML elements of tag passed in as an argument and its descendants.
find:It will search for the elements of the tag passed.
list.append(dict): This will append all the data to the list of type list.
第 4 步 –将列表文件转换为 CSV 文件以查看有组织的天气预报数据。
使用以下代码将列表转换为 CSV 文件并将其存储到output.csv
文件中:
import pandas as pd
convert = pd.DataFrame(list)
convert.to_csv("output.csv")
.
Syntax: pandas.DataFrame(data=None, index: Optional[Collection] = None, columns: Optional[Collection] = None, dtype: Union[str, numpy.dtype, ExtensionDtype, None] = None, copy: bool = False)
Parameters:
data: Dict can contain Series, arrays, constants, or list-like objects.
index : It is used for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.
columns: column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
dtype: It is used to set the Default value.
copy: It copy the data from input. default value is false.
# read csv file using pandas
a = pd.read_csv("output.csv")
print(a)
输出 :
在评论中写代码?请使用 ide.geeksforgeeks.org,生成链接并在此处分享链接。