CNN – 使用生成器对图像数据进行预处理

本文旨在学习如何对输入图像数据进行预处理，以将其转换为有意义的浮点张量，以馈送到卷积神经网络中。仅仅因为知识张量用于存储数据，它们可以被假设为多维数组。表示具有 3 个通道的 64 X 64 图像的张量将具有其尺寸 (64, 64, 3)。目前，数据以JPEG文件的形式存储在驱动器上，所以让我们看看实现它的步骤。

算法：

读取图片文件（存储在数据文件夹中）。
将 JPEG 内容解码为带通道的 RGB 像素网格。
将这些转换为浮点张量以输入到神经网络。
将像素值（0 到 255 之间）重新缩放到 [0, 1] 区间（因为在此范围内训练神经网络变得有效）。

这可能看起来有点挑剔，但 Keras 有实用程序来接管整个算法并为您完成繁重的工作。 Keras 有一个带有图像处理帮助工具的模块，位于keras.preprocessing.image 。它包含类ImageDataGenerator ，它使您可以快速设置Python生成器，该生成器可以自动将磁盘上的图像文件转换为成批的预处理张量。

代码：实际实现：

# Importing the ImageDataGenerator for pre-processing 
from keras.preprocessing.image import ImageDataGenerator
  
# Initialising the generators for train and test data
# The rescale parameter ensures the input range in [0, 1] 
train_datagen = ImageDataGenerator(rescale = 1./255)
test_datagen = ImageDataGenerator(rescale = 1./255)
  
# Creating the generators with each batch of size = 20 images
# The train_dir is the path to train folder which contains input classes
# Here it is 'cat' and 'dog' so class_mode is binary
  
train_generator = train_datagen.flow_from_directory(
                  train_dir,
                  target_size =(150, 150),  # target_size = input image size
                  batch_size = 20,
                  class_mode ='binary')
  
  
test_generator = test_datagen.flow_from_directory(
                    test_dir,
                    target_size =(150, 150),
                    batch_size = 20,
                    class_mode ='binary')

输出：

It yields batches of 150 × 150 RGB images of shape (20, 150, 150, 3) 
and binary labels of shape (20, ).

拟合模型：
让我们使用生成器将模型拟合到数据中，它是使用 fit_generator 方法完成的，相当于数据生成器的拟合，如下所示。它的第一个参数是一个Python生成器，它将无限期地产生一批输入和目标，因为数据是无休止地生成的，Keras 模型需要知道在宣布一个 epoch 结束之前从生成器中抽取多少样本。这就是steps_per_epoch参数的作用。
现在决定steps_per_epoch参数，因为我们总共有 2000 个训练图像，每批大小为 20，因此，steps_per_epoch 将为 2000 / 20 = 100。
代码：

# Your compiled model being trained with fit_generator
history = model.fit_generator(
             train_generator,
             steps_per_epoch = 100,
             epochs = 30,
             validation_data = test_generator,
             validation_steps = 50)
  
# Note: here the validation steps are necessary because
# the test_genrator also yield batches indefinitely in loops

在评论中写代码？请使用 ide.geeksforgeeks.org，生成链接并在此处分享链接。