使用Python预测空气质量指数
让我们看看如何使用Python预测空气质量指数。 AQI 是根据化学污染物的数量计算的。通过使用机器学习,我们可以预测 AQI。
AQI:空气质量指数是每天报告空气质量的指标。换句话说,它是衡量空气污染如何在短时间内影响一个人的健康的指标。 AQI 是根据在标准时间间隔内测量的特定污染物的平均浓度计算得出的。通常,大多数污染物的时间间隔为 24 小时,一氧化碳和臭氧的时间间隔为 8 小时。
我们可以通过查看 AQI 来了解空气污染情况AQI Level AQI Range Good 0 – 50 Moderate 51 – 100 Unhealthy 101 – 150 Unhealthy for Strong People 151 – 200 Hazardous 201+
让我们使用机器学习概念找到基于化学污染物的 AQI。
注意:要下载数据集,请单击此处。
数据集描述
它包含8个属性,其中7个是化学污染量,1个是空气质量指数。 PM2.5-AVG、PM10-AVG、NO2-AVG、NH3-AVG、SO2-AG、OZONE-AVG 是独立的属性。 air_quality_index 是一个依赖属性。由于 air_quality_index 是根据 7 个属性计算的。
由于数据是数值型的,并且数据中没有缺失值,所以不需要预处理。我们的目标是预测 AQI,因此这项任务要么是分类,要么是回归。因此,由于我们的类标签是连续的,因此需要回归技术。
回归是在给定范围内拟合数据的监督学习技术。 Python中的示例回归技术:
- 随机森林回归器
- Ada Boost 回归器
- 装袋回归器
- 线性回归等
Python3
# importing pandas module for data frame
import pandas as pd
# loading dataset and storing in train variable
train=pd.read_csv('AQI.csv')
# display top 5 data
train.head()
Python3
# importing Randomforest
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import RandomForestRegressor
# creating model
m1 = RandomForestRegressor()
# separating class label and other attributes
train1 = train.drop(['air_quality_index'], axis=1)
target = train['air_quality_index']
# Fitting the model
m1.fit(train1, target)
'''RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
max_depth=None, max_features='auto', max_leaf_nodes=None,
max_samples=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=100, n_jobs=None, oob_score=False,
random_state=None, verbose=0, warm_start=False)'''
# calculating the score and the score is 97.96360799890066%
m1.score(train1, target) * 100
# predicting the model with other values (testing the data)
# so AQI is 123.71
m1.predict([[123, 45, 67, 34, 5, 0, 23]])
# Adaboost model
# importing module
# defining model
m2 = AdaBoostRegressor()
# Fitting the model
m2.fit(train1, target)
'''AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
n_estimators=50, random_state=None)'''
# calculating the score and the score is 96.15377360010211%
m2.score(train1, target)*100
# predicting the model with other values (testing the data)
# so AQI is 94.42105263
m2.predict([[123, 45, 67, 34, 5, 0, 23]])
输出:
蟒蛇3
# importing Randomforest
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import RandomForestRegressor
# creating model
m1 = RandomForestRegressor()
# separating class label and other attributes
train1 = train.drop(['air_quality_index'], axis=1)
target = train['air_quality_index']
# Fitting the model
m1.fit(train1, target)
'''RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
max_depth=None, max_features='auto', max_leaf_nodes=None,
max_samples=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=100, n_jobs=None, oob_score=False,
random_state=None, verbose=0, warm_start=False)'''
# calculating the score and the score is 97.96360799890066%
m1.score(train1, target) * 100
# predicting the model with other values (testing the data)
# so AQI is 123.71
m1.predict([[123, 45, 67, 34, 5, 0, 23]])
# Adaboost model
# importing module
# defining model
m2 = AdaBoostRegressor()
# Fitting the model
m2.fit(train1, target)
'''AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
n_estimators=50, random_state=None)'''
# calculating the score and the score is 96.15377360010211%
m2.score(train1, target)*100
# predicting the model with other values (testing the data)
# so AQI is 94.42105263
m2.predict([[123, 45, 67, 34, 5, 0, 23]])
输出:
据此,我们可以说根据给定的测试数据,我们得到了 123 和 95,因此 AQI 是不健康的。