根据传感器数据预测车辆数量
先决条件:回归和分类 |监督机器学习
放置在路口的传感器收集不同路口车辆数量的数据,并将数据提供给运输经理。现在我们的任务是根据传感器数据预测车辆总数。
这篇文章解释了如何处理给定时间戳的传感器数据并预测特定时间的车辆计数,
数据集说明:
该数据集包含 2 个属性。它们是日期时间和车辆。其中 Vehicles 是类别标签。
下载此数据的链接 - 单击此处
类标签是数字类型。所以回归技术非常适合这个问题。回归用于将数据映射到预定义的函数中,它是一种监督学习算法,用于根据历史数据预测值。如果数据是数字,我们可以对数据进行回归。这里的类标签即 Vehicles 属性是数字类标签,因此应该进行回归。
随机森林回归器是一种集成技术,它获取输入并构建树,然后获取每行/每个元组所有树的平均值。
Syntax: RandomForestRegressor(n_estimators=100, *, criterion=’mse’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None,min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None,random_state=None, verbose=0, warm_start=False, ccp_alpha=0.0, max_samples=None)
方法:
- 导入必要的模块
- 加载数据集
- 分析数据
- 将 DateTime 属性转换为周、天、小时、月等(采用时间戳格式。)
- 构建模型
- 训练模型
- 测试数据
- 预测结果
第 1 步:导入用于加载数据框的 pandas 模块。
Python3
# importing the pandas module for
# data frame
import pandas as pd
# load the data set into train variable.
train = pd.read_csv('vehicles.csv')
# display top 5 values of data set
train.head()
Python3
# function to get all data fron time stamp
# get date
def get_dom(dt):
return dt.day
# get week day
def get_weekday(dt):
return dt.weekday()
# get hour
def get_hour(dt):
return dt.hour
# get year
def get_year(dt):
return dt.year
# get month
def get_month(dt):
return dt.month
# get year day
def get_dayofyear(dt):
return dt.dayofyear
# get year week
def get_weekofyear(dt):
return dt.weekofyear
train['DateTime'] = train['DateTime'].map(pd.to_datetime)
train['date'] = train['DateTime'].map(get_dom)
train['weekday'] = train['DateTime'].map(get_weekday)
train['hour'] = train['DateTime'].map(get_hour)
train['month'] = train['DateTime'].map(get_month)
train['year'] = train['DateTime'].map(get_year)
train['dayofyear'] = train['DateTime'].map(get_dayofyear)
train['weekofyear'] = train['DateTime'].map(get_weekofyear)
# display
train.head()
Python3
# there is no use of DateTime module
# so remove it
train = train.drop(['DateTime'], axis=1)
# separating class label for training the data
train1 = train.drop(['Vehicles'], axis=1)
# class label is stored in target
target = train['Vehicles']
print(train1.head())
target.head()
Python3
#importing Random forest
from sklearn.ensemble import RandomForestRegressor
#defining the RandomForestRegressor
m1=RandomForestRegressor()
m1.fit(train1,target)
#testing
m1.predict([[11,6,0,1,2015,11,2]])
输出:
第 2 步:定义用于从时间戳 (DateTime) 获取月、日、小时的函数并将其加载到不同的列中。
蟒蛇3
# function to get all data fron time stamp
# get date
def get_dom(dt):
return dt.day
# get week day
def get_weekday(dt):
return dt.weekday()
# get hour
def get_hour(dt):
return dt.hour
# get year
def get_year(dt):
return dt.year
# get month
def get_month(dt):
return dt.month
# get year day
def get_dayofyear(dt):
return dt.dayofyear
# get year week
def get_weekofyear(dt):
return dt.weekofyear
train['DateTime'] = train['DateTime'].map(pd.to_datetime)
train['date'] = train['DateTime'].map(get_dom)
train['weekday'] = train['DateTime'].map(get_weekday)
train['hour'] = train['DateTime'].map(get_hour)
train['month'] = train['DateTime'].map(get_month)
train['year'] = train['DateTime'].map(get_year)
train['dayofyear'] = train['DateTime'].map(get_dayofyear)
train['weekofyear'] = train['DateTime'].map(get_weekofyear)
# display
train.head()
输出:
第三步:分离类标签并存入目标变量
蟒蛇3
# there is no use of DateTime module
# so remove it
train = train.drop(['DateTime'], axis=1)
# separating class label for training the data
train1 = train.drop(['Vehicles'], axis=1)
# class label is stored in target
target = train['Vehicles']
print(train1.head())
target.head()
输出:
第 4 步:使用机器学习算法创建和训练数据,并在测试后预测结果。
蟒蛇3
#importing Random forest
from sklearn.ensemble import RandomForestRegressor
#defining the RandomForestRegressor
m1=RandomForestRegressor()
m1.fit(train1,target)
#testing
m1.predict([[11,6,0,1,2015,11,2]])
输出:
array([9.88021429])