使用Python构建燃油价格跟踪器
在这种现代生活方式中,燃料已成为全人类的必需品。它是我们生活方式的基础。因此,我们将编写一个脚本来使用Python跟踪他们的价格。
需要的模块
- BS4: Beautiful Soup(bs4) 是一个Python库,用于从 HTML 和 XML 文件中提取数据。这个模块没有内置在Python中。要安装此类型,请在终端中输入以下命令。
pip install bs4
- requests: Request 允许您非常轻松地发送 HTTP/1.1 请求。这个模块也没有内置在Python中。要安装此类型,请在终端中输入以下命令。
pip install requests
让我们看看脚本的逐步执行
第一步:导入所有依赖
Python3
# import module
import pandas as pd
import requests
from bs4 import BeautifulSoup
Python3
# user define function
# Scrape the data
def getdata(url):
r = requests.get(url)
return r.text
Python3
# link for extract html data
htmldata = getdata("https://www.goodreturns.in/petrol-price.html")
soup = BeautifulSoup(htmldata, 'html.parser')
result = soup.find_all("div", class_="gold_silver_table")
print(result)
Python3
# Declare string var
# Declare list
mydatastr = ''
result = []
# searching all tr in the html data
# storing as a string
for table in soup.find_all('tr'):
mydatastr += table.get_text()
# set according to your required
mydatastr = mydatastr[1:]
itemlist = mydatastr.split("\n\n")
for item in itemlist[:-5]:
result.append(item.split("\n"))
result
Python3
# Calling DataFrame constructor on list
df = pd.DataFrame(result[:-8])
df
Python3
# import module
import requests
import pandas as pd
from bs4 import BeautifulSoup
# link for extract html data
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata("https://www.goodreturns.in/petrol-price.html")
soup = BeautifulSoup(htmldata, 'html.parser')
# Declare string var
# Declare list
mydatastr = ''
result = []
# searching all tr in the html data
# storing as a string
for table in soup.find_all('tr'):
mydatastr += table.get_text()
# set according to your required
mydatastr = mydatastr[1:]
itemlist = mydatastr.split("\n\n")
for item in itemlist[:-5]:
result.append(item.split("\n"))
# Calling DataFrame constructor on list
df = pd.DataFrame(result[:-8])
df
第 2 步:创建 URL 获取函数
Python3
# user define function
# Scrape the data
def getdata(url):
r = requests.get(url)
return r.text
第 3 步:现在将 URL 传递给 getdata()函数并将该数据转换为 HTML 代码
Python3
# link for extract html data
htmldata = getdata("https://www.goodreturns.in/petrol-price.html")
soup = BeautifulSoup(htmldata, 'html.parser')
result = soup.find_all("div", class_="gold_silver_table")
print(result)
输出 :
[
City | Today Price | Yesterday’s Price |
New Delhi | ₹ 82.08 | ₹ 82.03 |
Kolkata | ₹ 83.57 | ₹ 83.52 |
Mumbai | ₹ 88.73 | ₹ 88.68 |
Chennai | ₹ 85.04 | ₹ 85.00 |
Gurgaon | ₹ 79.92 | ₹ 79.84 |
Noida | ₹ 82.23 | ₹ 82.30 |
Bangalore | ₹ 84.75 | ₹ 84.70 |
Bhubaneswar | ₹ 82.47 | ₹ 82.59 |
Chandigarh | ₹ 78.96 | ₹ 78.92 |
Hyderabad | ₹ 85.30 | ₹ 85.25 |
Jaipur | ₹ 90.08 | ₹ 89.24 |
Lucknow | ₹ 82.20 | ₹ 82.09 |
Patna | ₹ 84.73 | ₹ 84.88 |
Trivandrum | ₹ 83.91 | ₹ 84.03 |
注意:这些脚本只会为您提供字符串格式的原始数据,您必须根据需要打印数据。
第 4 步:现在,使用 soup.find_all() 将您需要的数据搜索到 sting 中。
Python3
# Declare string var
# Declare list
mydatastr = ''
result = []
# searching all tr in the html data
# storing as a string
for table in soup.find_all('tr'):
mydatastr += table.get_text()
# set according to your required
mydatastr = mydatastr[1:]
itemlist = mydatastr.split("\n\n")
for item in itemlist[:-5]:
result.append(item.split("\n"))
result
输出 :
第 4 步:制作一个 DataFrame 来显示您的结果。
Python3
# Calling DataFrame constructor on list
df = pd.DataFrame(result[:-8])
df
完整代码:
Python3
# import module
import requests
import pandas as pd
from bs4 import BeautifulSoup
# link for extract html data
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata("https://www.goodreturns.in/petrol-price.html")
soup = BeautifulSoup(htmldata, 'html.parser')
# Declare string var
# Declare list
mydatastr = ''
result = []
# searching all tr in the html data
# storing as a string
for table in soup.find_all('tr'):
mydatastr += table.get_text()
# set according to your required
mydatastr = mydatastr[1:]
itemlist = mydatastr.split("\n\n")
for item in itemlist[:-5]:
result.append(item.split("\n"))
# Calling DataFrame constructor on list
df = pd.DataFrame(result[:-8])
df
输出 :