📜  使用Python和 BS4 抓取天气预报数据

📅  最后修改于: 2022-05-13 01:54:22.511000             🧑  作者: Mango

使用Python和 BS4 抓取天气预报数据

本文围绕使用Python和 bs4 库报废天气预报数据展开。让我们检查脚本中使用的组件 -

我们将从https://weather.com/en-IN/weather/tenday/l/INKA0344:1:IN中删除数据。

第 1 步 –运行以下命令以将存储的内容从 URL 获取到响应对象(文件)中:

import requests
# to get data from website
file = requests.get("https://weather.com/en-IN/weather/tenday/l/INKA0344:1:IN")


第 2 步 -解析 HTML 内容:

# import Beautifulsoup for scraping the data 
from bs4 import BeautifulSoup
soup = BeautifulSoup(file.content, "html.parser")


第 3 步 –从天气站点抓取数据运行以下代码:

# create empty list
list =[]
all = soup.find("div", {"class":"locations-title ten-day-page-title"}).find("h1").text
   
# find all table with class-"twc-table"
content = soup.find_all("table", {"class":"twc-table"})
for items in content:
    for i in range(len(items.find_all("tr"))-1):
                # create empty dictionary
        dict = {}
        try:   
                        # assign value to given key 
  
            dict["day"]= items.find_all("span", {"class":"date-time"})[i].text
            dict["date"]= items.find_all("span", {"class":"day-detail"})[i].text            
            dict["desc"]= items.find_all("td", {"class":"description"})[i].text
            dict["temp"]= items.find_all("td", {"class":"temp"})[i].text
            dict["precip"]= items.find_all("td", {"class":"precip"})[i].text
            dict["wind"]= items.find_all("td", {"class":"wind"})[i].text
            dict["humidity"]= items.find_all("td", {"class":"humidity"})[i].text
        except:  
                     # assign None values if no items are there with specified class
  
            dict["day"]="None"
            dict["date"]="None"
            dict["desc"]="None"
            dict["temp"]="None"
            dict["precip"]="None"
            dict["wind"]="None"
            dict["humidity"]="None"
  
        # append dictionary values to the list
        list.append(dict)

第 4 步 –将列表文件转换为 CSV 文件以查看有组织的天气预报数据。

使用以下代码将列表转换为 CSV 文件并将其存储到output.csv文件中:

import pandas as pd
convert = pd.DataFrame(list)
convert.to_csv("output.csv")

.

# read csv file using pandas
a = pd.read_csv("output.csv")
print(a)

输出 :