使用Python预测空气质量指数

让我们看看如何使用Python预测空气质量指数。 AQI 是根据化学污染物的数量计算的。通过使用机器学习，我们可以预测 AQI。

AQI：空气质量指数是每天报告空气质量的指标。换句话说，它是衡量空气污染如何在短时间内影响一个人的健康的指标。 AQI 是根据在标准时间间隔内测量的特定污染物的平均浓度计算得出的。通常，大多数污染物的时间间隔为 24 小时，一氧化碳和臭氧的时间间隔为 8 小时。

我们可以通过查看 AQI 来了解空气污染情况

AQI Level	AQI Range
Good	0 – 50
Moderate	51 – 100
Unhealthy	101 – 150
Unhealthy for Strong People	151 – 200
Hazardous	201+

让我们使用机器学习概念找到基于化学污染物的 AQI。

注意：要下载数据集，请单击此处。

数据集描述

它包含8个属性，其中7个是化学污染量，1个是空气质量指数。 PM2.5-AVG、PM10-AVG、NO2-AVG、NH3-AVG、SO2-AG、OZONE-AVG 是独立的属性。 air_quality_index 是一个依赖属性。由于 air_quality_index 是根据 7 个属性计算的。

由于数据是数值型的，并且数据中没有缺失值，所以不需要预处理。我们的目标是预测 AQI，因此这项任务要么是分类，要么是回归。因此，由于我们的类标签是连续的，因此需要回归技术。

回归是在给定范围内拟合数据的监督学习技术。 Python中的示例回归技术：

随机森林回归器
Ada Boost 回归器
装袋回归器
线性回归等

Python3

# importing pandas module for data frame
import pandas as pd
 
# loading dataset and storing in train variable
train=pd.read_csv('AQI.csv')
 
# display top 5 data
train.head()

Python3

# importing Randomforest
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import RandomForestRegressor
 
# creating model
m1 = RandomForestRegressor()
 
# separating class label and other attributes
train1 = train.drop(['air_quality_index'], axis=1)
target = train['air_quality_index']
 
# Fitting the model
m1.fit(train1, target)
'''RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=100, n_jobs=None, oob_score=False,
                      random_state=None, verbose=0, warm_start=False)'''
 
# calculating the score and the score is  97.96360799890066%
m1.score(train1, target) * 100
 
# predicting the model with other values (testing the data)
# so AQI is 123.71
m1.predict([[123, 45, 67, 34, 5, 0, 23]])
 
# Adaboost model
# importing module
 
# defining model
m2 = AdaBoostRegressor()
 
# Fitting the model
m2.fit(train1, target)
 
'''AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
                  n_estimators=50, random_state=None)'''
 
# calculating the score and the score is  96.15377360010211%
m2.score(train1, target)*100
 
# predicting the model with other values (testing the data)
# so AQI is 94.42105263
m2.predict([[123, 45, 67, 34, 5, 0, 23]])

输出：

蟒蛇3

# importing Randomforest
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import RandomForestRegressor
 
# creating model
m1 = RandomForestRegressor()
 
# separating class label and other attributes
train1 = train.drop(['air_quality_index'], axis=1)
target = train['air_quality_index']
 
# Fitting the model
m1.fit(train1, target)
'''RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=100, n_jobs=None, oob_score=False,
                      random_state=None, verbose=0, warm_start=False)'''
 
# calculating the score and the score is  97.96360799890066%
m1.score(train1, target) * 100
 
# predicting the model with other values (testing the data)
# so AQI is 123.71
m1.predict([[123, 45, 67, 34, 5, 0, 23]])
 
# Adaboost model
# importing module
 
# defining model
m2 = AdaBoostRegressor()
 
# Fitting the model
m2.fit(train1, target)
 
'''AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
                  n_estimators=50, random_state=None)'''
 
# calculating the score and the score is  96.15377360010211%
m2.score(train1, target)*100
 
# predicting the model with other values (testing the data)
# so AQI is 94.42105263
m2.predict([[123, 45, 67, 34, 5, 0, 23]])

输出：

据此，我们可以说根据给定的测试数据，我们得到了 123 和 95，因此 AQI 是不健康的。