使用 Web App 进行图像分类
使用 CNN 检测紧急车辆
动机:最近我参加了由 Analytics Vidhya 主办的 JanataHack:计算机视觉黑客马拉松。比赛的目的是创建一个可以区分非紧急车辆的二进制图像分类器,例如。私人拥有的车辆。来自紧急车辆(警车、救护车等)。
问题陈述:
我们需要创建一个能够区分紧急车辆和非紧急车辆的分类器。紧急车辆标记为 1,非紧急车辆标记为 0。在本文中,我将展示我创建模型的方法,该模型在 10000 个中排在第 147 位。
本文展示的模型是卷积神经网络。我试图使代码尽可能简单。读者需要对神经网络有一定的了解。
问题解决步骤:
- 加载和可视化数据
- 数据清洗
- 造型
- 迁移学习
- 参数调整
- 最终模型。
代码:加载和可视化数据
Python3
# imports
import numpy as np
import os
import matplotlib.pyplot as plt
from PIL import Image, ImageOps, ImageFilter, ImageEnhance
import pandas as pd
# importing pytorch library.
import torchvision.transforms as transforms
import torch.nn.functional as F
import torch.nn as nn
from torch.utils.data import Dataset, random_split, DataLoader
Python3
# name of the image folder
imagePaths = 'images'
# reading the train.csv file using pandas
trainImages = pd.read_csv('train.csv')
# reading the test.csv file using pandas
testImages = pd.read_csv('test.csv')
# reading the submission file using pandas
samples = pd.read_csv('sample_submission.csv')
Python3
# defining train and labels list to store images and labels respectively.
train = []
labels = []
for image, label in zip(trainImages.iloc[:, 0], trainImages.iloc[:, 1]):
# create a image path and store in img_path variable
imgPath = os.path.join(imagePaths, image)
# Use PIl Image class to load the image
img = Image.open(imgPath)
# apply median filter to the image this helps in reducing noise
img = img.filter(ImageFilter.MedianFilter)
# convert the image to numpy array and store the loaded images into train
train.append(np.asarray(img))
# store the label into the labels list
labels.append(label)
Python3
# create subplots using the plt.subplots function
# the number of subplots depend on the n_rows and n_cols
# all the subplots are stored in ax variables
_, ax = plt.subplots(nrows = 4, ncols = 7, figsize =(12, 12))
# iterate through the ax variable by flattening it
for index, i in enumerate(ax.flatten()):
# the imshow is used to show the image
i.imshow(train[index])
# set the title
i.set_title(index)
# this below lines makes the code better visualize.
i.set_xticks([])
i.set_yticks([])
Python3
# Creating a VehicleDataset class for loading the images and labels .
# the following class needs to extend from the Dataset class
# provided by pytorch framework and implement the __len__ and __getitem__ methods.
class VehicleDataset(Dataset):
def __init__(self, csv_name, folder, transform = None, label = False):
self.label = label
self.folder = folder
print(csv_name)
self.dataframe = pd.read_csv(self.folder+'/'+csv_name+'.csv')
self.tms = transform
def __len__(self):
return len(self.dataframe)
def __getitem__(self, index):
row = self.dataframe.iloc[index]
imgIndex = row['image_names']
imageFile = self.folder + '/' + img_index
image = Image.open(image_file)
if self.label:
target = row['emergency_or_not']
if target == 0:
encode = torch.FloatTensor([1, 0])
else:
encode = torch.FloatTensor([0, 1])
return self.tms(image), encode
return self.tms(image)
# creating objects of VehicleDataset
# the deep learning models accepts the image to be in tensor format
# this is done using the transforms.ToTensor() methods
transform = transforms.Compose([transforms.ToTensor(),
])
'''
arguments:
csv_name - name of the csv file in out case train.csv
folder - folder in which the images are stored
transform - transforms the image to tensor,
label - used to differentiate between train and test set.
''''
trainDataset = VehicleDataset('train', 'images', label = True, transform = transform)
Python
# the EmergencyCustomModel class defines our Neural Network
# It inherites from the ImageClassificationBase class which has helper methods
# for printing the loss and accuracy at each epochs.
class EmergencyCustomModel(ImageClassificationBase):
def __init__(self):
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(3, 32, kernel_size = 3, padding = 1),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 64, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(64, 64, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(64, 128, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(128, 128, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(128, 256, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.AdaptiveAvgPool2d(1),
nn.Flatten(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 2),
# nn.Sigmoid(),
)
def forward(self, xb):
return self.network(xb)
Python3
# defining the training method.
# the evaluation method is used to calculate validation accuracy.
@torch.no_grad()
def evaluate(model, val_loader):
model.eval()
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)
# The fit method is used to train the model
# parameters
'''
epochs: no. of epochs the model trains
max_lr: maximum learning rate.
train_loader: here we pass the train dataset
val_loader: here we pass the val_dataset
opt_func : The learning algorithm that performs gradient descent.
model : the neural network to train on.
'''
def fit(epochs, max_lr, model, train_loader, val_loader,
weight_decay = 0, grad_clip = None, opt_func = torch.optim.SGD):
torch.cuda.empty_cache()
history = []
# Set up custom optimizer with weight decay
optimizer = opt_func(model.parameters(), max_lr, weight_decay = weight_decay)
# the loop iterates from 0 to number of epochs.
# the model needs to be set in the train model by calling the model.train.
for epoch in range(epochs):
# Training Phase
model.train()
train_losses = []
for batch in train_loader:
loss = model.training_step(batch)
train_losses.append(loss)
loss.backward()
# Gradient clipping
if grad_clip:
nn.utils.clip_grad_value_(model.parameters(), grad_clip)
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
result['train_loss'] = torch.stack(train_losses).mean().item()
model.epoch_end(epoch, result)
history.append(result)
return history
Python3
# the batchSize is the number of images passes by the loader at a time.
# reduce this number if theres an out of memory error.
batchSize = 32
valPct = 0.2
# code for splitting the data
# valPct variable is used to split dataset
valSize = int(valPct * len(trainDataset))
trainSize = len(trainDataset) - valSize
trainDs, valDs = random_split(trainDataset, [trainSize, valSize])
# Creating dataloaders.
train_loader = DataLoader(trainDs, batchSize)
val_loader = DataLoader(valDs, batchSize)
Python3
customModel = EmergencyCustomModel()
epochs = 10
lr = 0.01
# save the history to visualize later.
history = fit(epochs, lr, customModel, trainDl, valDl)
Python3
'''
parameters:
epochs = number of epochs the model was trained on
hist = the history returned by the fit function.
'''
def plot(hist, epochs = 10):
trainLoss = []
valLoss = []
valScore = []
for i in range(epochs):
trainLoss.append(hist[i]['train_loss'])
valLoss.append(hist[i]['val_loss'])
valScore.append(hist[i]['val_score'])
plt.plot(trainLoss, label ='train_loss')
plt.plot(valLoss, label ='val_loss')
plt.legend()
plt.title('loss')
plt.figure()
plt.plot(valScore, label ='val_score')
plt.legend()
plt.title('accuracy')
# calling the function
plot(history)
Python3
# to use the pretrained model we make use of the torchvision.models library
class ResNet50(ImageClassificationBase):
def __init__(self):
super().__init__()
# this following line adds the downloads the resnet50 model is it doesn't exits
# and stores it in pretrainedModle
self.pretrainedModel = models.resnet50(pretrained = True)
# since this model was trained on ImageNet data which has 1000 classes but for
# problem we have only 2 so will need to modify the final layer of the model
feature_in = self.pretrainedModel.fc.inFeatures
self.pretrainedModel.fc = nn.Linear(feature_in, 2)
def forward(self, x):
return self.pretrainedModel(x)
# Trainin the model.
# final Learning with
lr = 1e-4
epochs = 5
optFunc = torch.optim.Adam
# Here I have made use of the wd this is used as a regularization parameter
# It helps in preventing overfitting and helps our model to generalize.
bestWd = 1e-4
custom_model = to_device(ResNet50(), device)
hist = fit(epochs, lr, customModel, trainDl, valDl, bestWd, optFunc)
Python3
class Densenet169(ImageClassificationBase):
def __init__(self):
super().__init__()
# the below statement is used to download and store the pretrained model.
self.pretrained_model = models.densenet169(pretrained = True)
feature_in = self.pretrained_model.classifier.in_features
self.pretrained_model.classifier = nn.Linear(feature_in, 2)
def forward(self, x):
return self.pretrained_model(x)
Training the model
# final Learning with
lr = 1e-4
epochs = 5
optFunc = torch.optim.Adam
bestWd = 1e-4
customModel2 = Densenet169()
hist = fit(epochs, lr, customModel2, trainDl, valDl, bestWd, optFunc)
我们将使用:
- numpy :将图像存储到数组中,
- matplotlib :可视化图像,
- PILLOW or(PIL): 加载和转换图像的库
- Pytorch :用于我们的深度学习框架。
数据加载:
上图显示了提供给我们的数据集,训练集和测试集都存在于 images 文件夹中,训练和测试 CVS 文件包含图像的名称。
代码:
Python3
# name of the image folder
imagePaths = 'images'
# reading the train.csv file using pandas
trainImages = pd.read_csv('train.csv')
# reading the test.csv file using pandas
testImages = pd.read_csv('test.csv')
# reading the submission file using pandas
samples = pd.read_csv('sample_submission.csv')
代码:将图像加载到 numpy 数组中
Python3
# defining train and labels list to store images and labels respectively.
train = []
labels = []
for image, label in zip(trainImages.iloc[:, 0], trainImages.iloc[:, 1]):
# create a image path and store in img_path variable
imgPath = os.path.join(imagePaths, image)
# Use PIl Image class to load the image
img = Image.open(imgPath)
# apply median filter to the image this helps in reducing noise
img = img.filter(ImageFilter.MedianFilter)
# convert the image to numpy array and store the loaded images into train
train.append(np.asarray(img))
# store the label into the labels list
labels.append(label)
代码:打开和显示图像。
Python3
# create subplots using the plt.subplots function
# the number of subplots depend on the n_rows and n_cols
# all the subplots are stored in ax variables
_, ax = plt.subplots(nrows = 4, ncols = 7, figsize =(12, 12))
# iterate through the ax variable by flattening it
for index, i in enumerate(ax.flatten()):
# the imshow is used to show the image
i.imshow(train[index])
# set the title
i.set_title(index)
# this below lines makes the code better visualize.
i.set_xticks([])
i.set_yticks([])
输出:
现在我们将图像存储在训练中,输出类存储在标签中,我们可以继续下一步。
数据清洗
在本节中,我们将通过删除这些图像来查看未分类标签和不正确的图像样本,我的准确性将 val_score 提高了 2%。它从 94% 上升到 96%,有时甚至达到 97%。
Miss labelledImages:用于可视化数据的代码同上
不正确的数据:仪表板的图像。
通过删除这些图像,精度变得更加稳定(振荡更少)。需要注意的是,我能够删除这些仪表板图像,因为我在测试数据中没有找到任何类似的图像。
定义 DatasetClass:对于从磁盘加载数据集的模型,pytorch 提供了一个 DatasetClass,我们不需要将整个模型放入内存中。
代码:
Python3
# Creating a VehicleDataset class for loading the images and labels .
# the following class needs to extend from the Dataset class
# provided by pytorch framework and implement the __len__ and __getitem__ methods.
class VehicleDataset(Dataset):
def __init__(self, csv_name, folder, transform = None, label = False):
self.label = label
self.folder = folder
print(csv_name)
self.dataframe = pd.read_csv(self.folder+'/'+csv_name+'.csv')
self.tms = transform
def __len__(self):
return len(self.dataframe)
def __getitem__(self, index):
row = self.dataframe.iloc[index]
imgIndex = row['image_names']
imageFile = self.folder + '/' + img_index
image = Image.open(image_file)
if self.label:
target = row['emergency_or_not']
if target == 0:
encode = torch.FloatTensor([1, 0])
else:
encode = torch.FloatTensor([0, 1])
return self.tms(image), encode
return self.tms(image)
# creating objects of VehicleDataset
# the deep learning models accepts the image to be in tensor format
# this is done using the transforms.ToTensor() methods
transform = transforms.Compose([transforms.ToTensor(),
])
'''
arguments:
csv_name - name of the csv file in out case train.csv
folder - folder in which the images are stored
transform - transforms the image to tensor,
label - used to differentiate between train and test set.
''''
trainDataset = VehicleDataset('train', 'images', label = True, transform = transform)
现在我们已经准备好数据管道,我们需要创建深度学习模型。
CNN模型:
这篇文章假设您对神经网络有一定的了解,因为这超出了本文的范围。我将使用 CNN(卷积神经网络)。该模型有 3 个主要层,分别命名为 conv2d 层、batch Norm 和 max pooling 2d,这里使用的激活函数是 relu:
代码:
Python
# the EmergencyCustomModel class defines our Neural Network
# It inherites from the ImageClassificationBase class which has helper methods
# for printing the loss and accuracy at each epochs.
class EmergencyCustomModel(ImageClassificationBase):
def __init__(self):
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(3, 32, kernel_size = 3, padding = 1),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 64, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(64, 64, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(64, 128, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(128, 128, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(128, 256, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.AdaptiveAvgPool2d(1),
nn.Flatten(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 2),
# nn.Sigmoid(),
)
def forward(self, xb):
return self.network(xb)
可以在我的github 存储库中的这个笔记本中找到整个模型定义
训练函数:
代码:以下函数用于训练帖子中的所有模型。
Python3
# defining the training method.
# the evaluation method is used to calculate validation accuracy.
@torch.no_grad()
def evaluate(model, val_loader):
model.eval()
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)
# The fit method is used to train the model
# parameters
'''
epochs: no. of epochs the model trains
max_lr: maximum learning rate.
train_loader: here we pass the train dataset
val_loader: here we pass the val_dataset
opt_func : The learning algorithm that performs gradient descent.
model : the neural network to train on.
'''
def fit(epochs, max_lr, model, train_loader, val_loader,
weight_decay = 0, grad_clip = None, opt_func = torch.optim.SGD):
torch.cuda.empty_cache()
history = []
# Set up custom optimizer with weight decay
optimizer = opt_func(model.parameters(), max_lr, weight_decay = weight_decay)
# the loop iterates from 0 to number of epochs.
# the model needs to be set in the train model by calling the model.train.
for epoch in range(epochs):
# Training Phase
model.train()
train_losses = []
for batch in train_loader:
loss = model.training_step(batch)
train_losses.append(loss)
loss.backward()
# Gradient clipping
if grad_clip:
nn.utils.clip_grad_value_(model.parameters(), grad_clip)
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
result['train_loss'] = torch.stack(train_losses).mean().item()
model.epoch_end(epoch, result)
history.append(result)
return history
在开始训练之前,我们需要将数据分成训练集和验证集。这样做是为了使模型可以很好地概括看不见的数据。我们将进行 80 - 20 拆分 80% 训练和 20% 测试。拆分数据后,我们需要将数据集传递给 pytorch 提供的数据加载器。
代码:拆分和创建数据加载器。
Python3
# the batchSize is the number of images passes by the loader at a time.
# reduce this number if theres an out of memory error.
batchSize = 32
valPct = 0.2
# code for splitting the data
# valPct variable is used to split dataset
valSize = int(valPct * len(trainDataset))
trainSize = len(trainDataset) - valSize
trainDs, valDs = random_split(trainDataset, [trainSize, valSize])
# Creating dataloaders.
train_loader = DataLoader(trainDs, batchSize)
val_loader = DataLoader(valDs, batchSize)
现在我们准备通过调用 fit() 方法开始训练。
Python3
customModel = EmergencyCustomModel()
epochs = 10
lr = 0.01
# save the history to visualize later.
history = fit(epochs, lr, customModel, trainDl, valDl)
上述代码的输出:
整个代码可在github repo链接中找到,如下所示。
代码:绘图函数用于生成如下所示的损失和准确度图
Python3
'''
parameters:
epochs = number of epochs the model was trained on
hist = the history returned by the fit function.
'''
def plot(hist, epochs = 10):
trainLoss = []
valLoss = []
valScore = []
for i in range(epochs):
trainLoss.append(hist[i]['train_loss'])
valLoss.append(hist[i]['val_loss'])
valScore.append(hist[i]['val_score'])
plt.plot(trainLoss, label ='train_loss')
plt.plot(valLoss, label ='val_loss')
plt.legend()
plt.title('loss')
plt.figure()
plt.plot(valScore, label ='val_score')
plt.legend()
plt.title('accuracy')
# calling the function
plot(history)
输出:绘制损失和准确度图。
过拟合非常少,val_accuracy 在 90% 时达到峰值。再次在这里,我想补充一下,当我在 keras 中创建自定义模型时,我能够实现的高度 val_score 是 83%,改变框架使我们增加了 7%。模式大小的另一件事,使用 pytorch,我能够使用具有超过 3 个 Conv2d 层的模型而不会过度拟合。但是在 keras 中,我只能使用 2 层,而任何更高或更低的层都只会增加训练成本而不会提高准确性。
迁移学习:
使用预训练模型:我使用了两种模型架构 resnet 和densenet。一件事不是densenet模型产生与具有较低时期的resnet模型几乎相似的结果,最重要的是保存的模型占用了一半的内存空间。
代码:
Python3
# to use the pretrained model we make use of the torchvision.models library
class ResNet50(ImageClassificationBase):
def __init__(self):
super().__init__()
# this following line adds the downloads the resnet50 model is it doesn't exits
# and stores it in pretrainedModle
self.pretrainedModel = models.resnet50(pretrained = True)
# since this model was trained on ImageNet data which has 1000 classes but for
# problem we have only 2 so will need to modify the final layer of the model
feature_in = self.pretrainedModel.fc.inFeatures
self.pretrainedModel.fc = nn.Linear(feature_in, 2)
def forward(self, x):
return self.pretrainedModel(x)
# Trainin the model.
# final Learning with
lr = 1e-4
epochs = 5
optFunc = torch.optim.Adam
# Here I have made use of the wd this is used as a regularization parameter
# It helps in preventing overfitting and helps our model to generalize.
bestWd = 1e-4
custom_model = to_device(ResNet50(), device)
hist = fit(epochs, lr, customModel, trainDl, valDl, bestWd, optFunc)
输出:绘制损失和准确度图。
在这里可以看到很多过度拟合,现在 val_score 有所改进。我决定尝试使用循环调度器训练策略,结果如下。我仍然需要用这种方法做更多的实验,但正如大家所见。我已经在一定程度上减少了过度拟合,但 val_accuracy 仍然很低。
使用 Densenet169:密集网络类似于 Resnet,而是添加跳过连接,将其连接起来,因此这些块被称为密集块。
代码:
Python3
class Densenet169(ImageClassificationBase):
def __init__(self):
super().__init__()
# the below statement is used to download and store the pretrained model.
self.pretrained_model = models.densenet169(pretrained = True)
feature_in = self.pretrained_model.classifier.in_features
self.pretrained_model.classifier = nn.Linear(feature_in, 2)
def forward(self, x):
return self.pretrained_model(x)
Training the model
# final Learning with
lr = 1e-4
epochs = 5
optFunc = torch.optim.Adam
bestWd = 1e-4
customModel2 = Densenet169()
hist = fit(epochs, lr, customModel2, trainDl, valDl, bestWd, optFunc)
如果您查看损失和准确度图。过拟合减少了。 val 精度更好,但这是在没有循环调度程序的情况下完成的。
代码:绘制损失和准确度图。
使用提前停止训练可以在 5 个 epoch 处停止。
网络应用程序:
注意:网络应用程序只接受 jpg 图像。
结论:我能够在 10000 人中获得 200 名,所以我使用上述模型进入了前 2%。所有代码都将在我的 github 存储库中提供:https://github.com/evilc3/EmergencyVehicleDetector
整个笔记本: https://colab.research.google.com/drive/13En-V2A-w2o4uXuDZk0ypktxzX9joXIY?usp=sharing
网络应用链接: https://emervehicledetector.herokuapp.com/