ML | Python无监督人脸聚类管线

📌 相关文章

📜 ML | Python无监督人脸聚类管线

📅 最后修改于: 2020-04-22 14:04:15 🧑 作者: Mango

实时面部识别是自动化安全部门仍然面临的问题。随着卷积神经网络的发展以及CNN特别创新的方式，已经证实，使用我们当前的技术，我们可以选择监督学习的选择，例如FaceNet，YOLO，以便在现实环境中进行快速和实时的人脸识别。
要训练监督模型，我们需要获取目标标签的数据集，这仍然是一项繁琐的任务。我们需要一种高效且自动化的解决方案，用于数据集的生成，而用户干预所需的标记工作却很少。

拟议的解决方案

简介：我们正在提议一个数据集生成管道，该管道以视频剪辑为源，提取所有面部并将其聚类为代表不同人物的有限且准确的图像集。每个集合都可以轻松地通过人工输入来轻松标记。
技术细节：我们将使用opencv库每秒从输入视频剪辑中提取帧。
我们将使用face_recognition 库(支持dlib)从框架中提取人脸，并将其对齐以进行特征提取。
然后，我们将提取人类可观察的特征，并使用scikit-learn提供的DBSCAN聚类对它们进行聚类。对于该解决方案，我们将裁剪所有面孔，创建标签并将它们分组在文件夹中，以供用户将其调整为训练用例的数据集。
实施中的挑战：对于更大的受众，我们计划实施该解决方案以在CPU而非NVIDIA GPU中执行。使用NVIDIA GPU可以提高管道的效率。
面部嵌入提取的CPU实现速度非常慢(每个图像30秒以上)。为了解决该问题，我们通过并行管道执行(每个图像约13秒)来实现它们，然后合并它们的结果以用于进一步的聚类任务。tqdm与PyPiper一起介绍了进度更新和调整从输入视频中提取的帧的大小，以便流畅地执行管道。

输入： Footage.mp4
输出：

必需的Python3模块：
os, cv2, numpy, tensorflow, json, re, shutil, time, pickle, pyPiper, tqdm, imutils, face_recognition, dlib, warnings, sklearn

代码段部分：
对于FaceClusteringLibrary.py包含所有类定义的文件内容，以下是代码段及其工作原理的说明。
ResizeUtils提供功能rescale_by_height和实现rescale_by_width。
“ rescale_by_width”是一个将“image”和“ target_width”作为输入的函数。它会按比例放大/缩小图像尺寸以适应宽度target_width，同时自动计算高度，以使纵横比保持不变。 rescale_by_height也相同，但它以宽度为目标，而不是宽度。

'''
ResizeUtils提供了调整大小的功能，以保持长宽比完整。原作者：AndyP at StackOverflow'''
class ResizeUtils:
    # 给定目标高度，通过计算宽度和调整大小来调整图像
    def rescale_by_height(self, image, target_height,
                        method = cv2.INTER_LANCZOS4):
        # 将`image`重新缩放为`target_height`＃(保留宽高比)
        w = int(round(target_height * image.shape[1] / image.shape[0]))
        return (cv2.resize(image, (w, target_height),
                             interpolation = method))
    # 给定目标宽度，通过计算高度和调整大小来调整图像
    def rescale_by_width(self, image, target_width,
                        method = cv2.INTER_LANCZOS4):
        # 将`image`重新缩放为`target_width`(保留宽高比)
        h = int(round(target_width * image.shape[0] / image.shape[1]))
        return (cv2.resize(image, (target_width, h),
                            interpolation = method))

以下是的定义 FramesGenerator类。此类提供了通过顺序读取视频来提取jpg图像的功能。如果我们以输入视频文件为例，它的帧速率约为30 fps。我们可以得出结论，对于1秒钟的视频，将有30张图像。即使是2分钟的视频，要处理的图像数量也将是2 * 60 * 30 =3600。要处理的图像数量太多，可能需要数小时才能完成流水线处理。

但是还有一个事实，就是面孔和人在一秒钟之内可能不会改变。因此，考虑一个2分钟的视频，要在1秒钟内生成30张图像是麻烦且重复的。相反，我们只能在1秒钟内拍摄1张图像。“ FramesGenerator”实现每秒仅从视频剪辑中转储1张图像。
考虑到转储的图像需要face_recognition/dlib进行脸部提取处理，因此我们尝试将高度的阈值保持为不大于500，宽度的上限为700。此限制由“AutoResize”函数强加，该函数会进一步调用rescale_by_height或rescale_by_width减小图像的大小。如果达到极限但仍然保留宽高比，则为图像。
进入以下代码段，AutoResize函数试图对给定图像的尺寸施加限制。如果宽度大于700，我们将其缩小以保持宽度700并保持纵横比。此处设置的另一个限制是，高度不能大于500。

# FramesGenerator从给定的视频文件中提取图像帧
# 调整图像帧的大小以进行face_recognition / dlib处理
class FramesGenerator:
    def __init__(self, VideoFootageSource):
        self.VideoFootageSource = VideoFootageSource
    # 调整给定输入的大小以适合指定的尺寸以提取人脸嵌入
    def AutoResize(self, frame):
        resizeUtils = ResizeUtils()
        height, width, _ = frame.shape
        if height > 500:
            frame = resizeUtils.rescale_by_height(frame, 500)
            self.AutoResize(frame)
        if width > 700:
            frame = resizeUtils.rescale_by_width(frame, 700)
            self.AutoResize(frame)
        return frame

以下是GenerateFrame函数片段。它查询fps以决定在多少帧中可以转储1张图像。我们清除输出目录并开始遍历所有帧。在转储任何图像之前，如果图像达到AutoResize 函数中指定的限制，我们将调整图像的大小。

# 从视频素材＃中每秒提取1帧，并将这些帧保存到特定文件夹
def GenerateFrames(self, OutputDirectoryName):
    cap = cv2.VideoCapture(self.VideoFootageSource)
    _, frame = cap.read()
    fps = cap.get(cv2.CAP_PROP_FPS)
    TotalFrames = cap.get(cv2.CAP_PROP_FRAME_COUNT)
    print("[INFO] Total Frames ", TotalFrames, " @ ", fps, " fps")
    print("[INFO] Calculating number of frames per second")
    CurrentDirectory = os.path.curdir
    OutputDirectoryPath = os.path.join(
      CurrentDirectory, OutputDirectoryName)
    if os.path.exists(OutputDirectoryPath):
        shutil.rmtree(OutputDirectoryPath)
        time.sleep(0.5)
    os.mkdir(OutputDirectoryPath)
    CurrentFrame = 1
    fpsCounter = 0
    FrameWrittenCount = 1
    while CurrentFrame < TotalFrames:
        _, frame = cap.read()
        if (frame is None):
            continue
        if fpsCounter > fps:
            fpsCounter = 0
            frame = self.AutoResize(frame)
            filename = "frame_" + str(FrameWrittenCount) + ".jpg"
            cv2.imwrite(os.path.join(
              OutputDirectoryPath, filename), frame)
            FrameWrittenCount += 1
        fpsCounter += 1
        CurrentFrame += 1
    print('[INFO] Frames extracted')

以下是FramesProvider类的摘要。它继承了“Node”，可用于构建图像处理管道。我们实现“setup”和“run”函数。“ setup”函数中定义的任何参数都可以具有参数，构造函数将在创建对象时将其作为参数。在这里，我们可以将sourcePath 参数传递给FramesProvider 对象。“setup”函数仅运行一次。“运行”函数通过调用emit 函数来处理流水线来运行并保持发射数据，直到close 函数被调用为止。
在这里，在“setup”中，我们接受sourcePath 作为参数并遍历给定框架目录中的所有文件。无论文件的扩展名是什么.jpg(将由class生成FrameGenerator)，我们都将其添加到“ filesList”列表中。
在run 函数调用期间，“ filesList”中的所有jpg图像路径都打包有指定唯一“ id”和“ imagePath”作为对象的属性，并发送到管道进行处理。

# 以下是管道构造的节点。 
# 将创建并异步执行线程读取图像，提取面部特征并将其独立存储在不同的线程中
# 继续将文件名发送到管道中进行处理
class FramesProvider(Node):
    def setup(self, sourcePath):
        self.sourcePath = sourcePath
        self.filesList = []
        for item in os.listdir(self.sourcePath):
            _, fileExt = os.path.splitext(item)
            if fileExt == '.jpg':
                self.filesList.append(os.path.join(item))
        self.TotalFilesCount = self.size = len(self.filesList)
        self.ProcessedFilesCount = self.pos = 0
    # 发出管道中的每个文件名以进行并行处理
    def run(self, data):
        if self.ProcessedFilesCount < self.TotalFilesCount:
            self.emit({'id': self.ProcessedFilesCount,
                'imagePath': os.path.join(self.sourcePath,
                              self.filesList[self.ProcessedFilesCount])})
            self.ProcessedFilesCount += 1
            self.pos = self.ProcessedFilesCount
        else:
            self.close()

以下是继承于“Node” 的“ FaceEncoder ” 的类实现，可以将其推送到图像处理管道中。在“setup”函数中，我们接受“ face_recognition / dlib”面部识别器调用的“ detection_method”值。它可以具有基于“ cnn”的检测器或基于“ hog”的检测器。
“run”函数将输入的数据解压缩为“ id”和“ imagePath”。

随后，它从“ imagePath”中读取图像，运行在“ face_recognition / dlib”库中定义的“ face_location”，以裁剪出对齐的面部图像，这是我们感兴趣的区域。对齐的面部图像是矩形裁剪的图像，其眼睛和嘴唇与图像中的特定位置对齐(注意：实现可能与其他库(例如，opencv)不同。
此外，我们调用在“ face_recognition / dlib”中定义的“ face_encodings”函数，以从每个框中提取面部嵌入。嵌入浮点值可以帮助您在对齐的面部图像中找到特征的确切位置。
我们将变量“ d”定义为一组盒子和各自的嵌入。现在，我们将“ id”和嵌入数组打包为对象中的“ encoding”键，并将其发送到图像处理管道。

# 对面部嵌入，参考路径号和位置进行编码，然后发送到管道
class FaceEncoder(Node):
    def setup(self, detection_method = 'cnn'):
        self.detection_method = detection_method
        # detection_method可以是cnn或hog
    def run(self, data):
        id = data['id']
        imagePath = data['imagePath']
        image = cv2.imread(imagePath)
        rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        boxes = face_recognition.face_locations(
               rgb, model = self.detection_method)
        encodings = face_recognition.face_encodings(rgb, boxes)
        d = [{"imagePath": imagePath, "loc": box, "encoding": enc}
                         for (box, enc) in zip(boxes, encodings)]
        self.emit({'id': id, 'encodings': d})

以下是其实现DatastoreManager再次从“Node”继承并可以插入图像处理管道的实现。该类的目的是将“ encodings”数组作为pickle文件转储，并使用“ id”参数唯一地命名pickle文件。我们希望管道运行多线程。
为了利用多线程来提高性能，我们需要适当地分离异步任务，并尝试避免任何同步需求。因此，为了获得最佳性能，我们独立让管道中的线程将数据写出到单独的文件中，而不会干扰任何其他线程操作。
如果您正在考虑在不使用多线程的情况下在二手开发硬件中节省多少时间，则平均嵌入提取时间约为30秒。经过多线程管道之后(具有4个线程)，它减少到约10秒，但付出了高CPU使用率的代价。由于线程大约需要10秒钟，因此不会发生频繁的磁盘写操作，并且不会影响我们的多线程性能。

另一种情况，如果您在考虑为什么使用pickle而不是JSON？事实是JSON是代替pickle的更好的选择。pickle对于数据存储和通讯非常不安全。可以对pickle进行恶意修改，以将可执行代码嵌入Python中。JSON文件易于阅读，并且编码和解码速度更快。pickle唯一擅长的是将Python对象和内容无错误地转储到二进制文件中。
由于我们不打算存储和分发pickle文件，并且为了实现无错误执行，我们正在使用pickle。否则，强烈建议您使用JSON和其他替代方法。

# 接收面部encoding以进行聚类，并获id来命名不同的文件名
class DatastoreManager(Node):
    def setup(self, encodingsOutputPath):
        self.encodingsOutputPath = encodingsOutputPath
    def run(self, data):
        encodings = data['encodings']
        id = data['id']
        with open(os.path.join(self.encodingsOutputPath,
                   'encodings_' + str(id) + '.pickle'), 'wb') as f:
            f.write(pickle.dumps(encodings))

以下是实现PickleListCollator。它旨在读取多个pickle文件中的对象数组，合并为一个数组，然后将合并的数组转储为单个pickle文件。
在这里，只有一个函数GeneratePickle接受outputFilepath.该函数指定将包含合并数组的单个输出pickle文件。

# PicklesListCollator将多个pickle文件作为输入并将它们合并在一起。
# 它专门用于支持将不同的pickle文件合并为一个的用例
class PicklesListCollator:
    def __init__(self, picklesInputDirectory):
        self.picklesInputDirectory = picklesInputDirectory
    # 在这里，我们将列出所有pickle
    # 从多个线程生成的文件，读取结果列表，将它们追加到公用列表中，并创建另一个以组合列表为内容的pickle
    def GeneratePickle(self, outputFilepath):
        datastore = []
        ListOfPickleFiles = []
        for item in os.listdir(self.picklesInputDirectory):
            _, fileExt = os.path.splitext(item)
            if fileExt == '.pickle':
                ListOfPickleFiles.append(os.path.join(
                    self.picklesInputDirectory, item))
        for picklePath in ListOfPickleFiles:
            with open(picklePath, "rb") as f:
                data = pickle.loads(f.read())
                datastore.extend(data)
        with open(outputFilepath, 'wb') as f:
            f.write(pickle.dumps(datastore))

以下是FaceClusterUtility该类的实现。定义了一个构造函数，该构造函数将带有值的“ EncodingFilePath”作为合并的pickle文件的路径。我们从pickle文件中读取数组，并尝试使用“ scikit”库中的“ DBSCAN”实现对它们进行集群。与k均值不同，DBSCAN扫描不需要簇数。簇数取决于阈值参数，并将自动计算。DBSCAN实现在“ scikit”中提供，并且也接受用于计算的线程数。
在这里，我们有一个函数“ Cluster”，该函数将被调用以从pickle文件中读取数组数据，运行“ DBSCAN”，将唯一的簇打印为唯一的面并返回标签。标签是代表类别的唯一值，可用于识别数组中存在的面部的类别。(数组内容来自pickle文件)。

# 人脸聚类功能
class FaceClusterUtility:
    def __init__(self, EncodingFilePath):
        self.EncodingFilePath = EncodingFilePath
    # 鸣谢：Arian的pyimagesearch用于聚类代码
    # 在这里，我们使用sklearn.DBSCAN函数
    # 聚类所有面部encoding以获得代表不同的人的聚类
    def Cluster(self):
        InputEncodingFile = self.EncodingFilePath
        if not (os.path.isfile(InputEncodingFile) and
                os.access(InputEncodingFile, os.R_OK)):
            print('The input encoding file, ' +
                    str(InputEncodingFile) +
                    ' does not exists or unreadable')
            exit()
        NumberOfParallelJobs = -1
        # 从磁盘加载序列化的面部编码+边界框位置，然后将编码集提取，以便我们可以在它们上聚类
        print("[INFO] Loading encodings")
        data = pickle.loads(open(InputEncodingFile, "rb").read())
        data = np.array(data)
        encodings = [d["encoding"] for d in data]
        # 将embeddings聚类
        print("[INFO] Clustering")
        clt = DBSCAN(eps = 0.5, metric ="euclidean",
                      n_jobs = NumberOfParallelJobs)
        clt.fit(encodings)
        # 确定在数据集中发现的独特面孔的总数
        labelIDs = np.unique(clt.labels_)
        numUniqueFaces = len(np.where(labelIDs > -1)[0])
        print("[INFO] # unique faces: {}".format(numUniqueFaces))
        return clt.labels_

以下是TqdmUpdate从“ tqdm”继承的类的实现。tqdm 是一个Python库，可在控制台界面中可视化进度条。
变量“ n”和“total“由“ tqdm”识别。这两个变量的值用于计算进度。
当绑定到管道框架“ PyPiper”中的update事件时，将提供“ update”函数中的参数“ done”和“ total_size”。super().refresh()调用“ tqdm”类中“ refresh”函数的实现，该函数可视化并更新控制台中的进度条。

# 继承类tqdm以可视化进度
class TqdmUpdate(tqdm):
    # 此函数将作为progress回调函数传递。为可视化中的自动更新设置预定义的变量
    def update(self, done, total_size = None):
        if total_size is not None:
            self.total = total_size
        self.n = done
        super().refresh()

以下是FaceImageGenerator该类的实现。此类提供的函数是从聚类后产生的标签生成蒙太奇，裁剪的人像图像和注释，以备将来训练之用(例如Darknet YOLO)。
构造函数期望将其EncodingFilePath作为合并的pickle文件路径。它将用于加载所有面部编码。现在，我们对用于生成图像的“ imagePath”和面部坐标感兴趣。
对“ GenerateImages”的调用完成了预期的工作。我们从合并的pickle文件中加载数组。我们对标签应用唯一的操作，并遍历整个标签。在标签迭代中，对于每个唯一标签，我们列出了具有相同当前标签的所有数组索引。再次迭代这些数组索引以处理每个人脸。对于人脸处理，我们使用索引来获取图像文件的路径和人脸坐标。
从图像文件的路径加载图像文件。人脸的坐标被扩展为人像形状(并且我们还确保其扩展不会超过图像的尺寸)，并将其裁剪并转储为人像图像。
我们再次从原始坐标开始，并进行一些扩展以创建注释，以用于将来受监督的训练选项，以提高识别能力。
对于注释，我们只是为“ Darknet YOLO”设计了它，但它也可以适用于任何其他框架。最后，我们构建一个蒙太奇并将其写出到图像文件中。

class FaceImageGenerator:
    def __init__(self, EncodingFilePath):
        self.EncodingFilePath = EncodingFilePath
    # 在这里，我们为每个不同的面孔创建前25个面孔的蒙太奇。 
    # 我们还将使用簇中的标签和编码和pickle文件中的图像url为所有个不同的面孔生成图像。 
    # 为训练目的增加了面部边界框，我们还为每个面部图像创建了精确的注释(类似于Darknet YOLO)
    # 为了将来的使用轻松地修改注释。并能够在监督训练中使用
    def GenerateImages(self, labels, OutputFolderName = "ClusteredFaces",
                                            MontageOutputFolder = "Montage"):
        output_directory = os.getcwd()
        OutputFolder = os.path.join(output_directory, OutputFolderName)
        if not os.path.exists(OutputFolder):
            os.makedirs(OutputFolder)
        else:
            shutil.rmtree(OutputFolder)
            time.sleep(0.5)
            os.makedirs(OutputFolder)
        MontageFolderPath = os.path.join(OutputFolder, MontageOutputFolder)
        os.makedirs(MontageFolderPath)
        data = pickle.loads(open(self.EncodingFilePath, "rb").read())
        data = np.array(data)
        labelIDs = np.unique(labels)
        # 遍历唯一的面孔整数
        for labelID in labelIDs:
            # 在属于当前标签ID的data数组中查找所有索引，然后从集合中随机抽取最多25个索引
            print("[INFO] faces for face ID: {}".format(labelID))
            FaceFolder = os.path.join(OutputFolder, "Face_" + str(labelID))
            os.makedirs(FaceFolder)
            idxs = np.where(labels == labelID)[0]
            # 将面孔列表初始化为，包含在蒙太奇中
            portraits = []
            # 循环采样索引
            counter = 1
            for i in idxs:
                # 加载输入图像并提取面部ROI
                image = cv2.imread(data[i]["imagePath"])
                (o_top, o_right, o_bottom, o_left) = data[i]["loc"]
                height, width, channel = image.shape
                widthMargin = 100
                heightMargin = 150
                top = o_top - heightMargin
                if top < 0: top = 0
                bottom = o_bottom + heightMargin
                if bottom > height: bottom = height
                left = o_left - widthMargin
                if left < 0: left = 0
                right = o_right + widthMargin
                if right > width: right = width
                portrait = image[top:bottom, left:right]
                if len(portraits) < 25:
                    portraits.append(portrait)
                resizeUtils = ResizeUtils()
                portrait = resizeUtils.rescale_by_width(portrait, 400)
                FaceFilename = "face_" + str(counter) + ".jpg"
                FaceImagePath = os.path.join(FaceFolder, FaceFilename)
                cv2.imwrite(FaceImagePath, portrait)
                widthMargin = 20
                heightMargin = 20
                top = o_top - heightMargin
                if top < 0: top = 0
                bottom = o_bottom + heightMargin
                if bottom > height: bottom = height
                left = o_left - widthMargin
                if left < 0: left = 0
                right = o_right + widthMargin
                if right > width:
                    right = width
                AnnotationFilename = "face_" + str(counter) + ".txt"
                AnnotationFilePath = os.path.join(FaceFolder, AnnotationFilename)
                f = open(AnnotationFilePath, 'w')
                f.write(str(labelID) + ' ' +
                        str(left) + ' ' + str(top) + ' ' +
                        str(right) + ' ' + str(bottom) + "\n")
                f.close()
                counter += 1
            montage = build_montages(portraits, (96, 120), (5, 5))[0]
            MontageFilenamePath = os.path.join(
               MontageFolderPath, "Face_" + str(labelID) + ".jpg")
            cv2.imwrite(MontageFilenamePath, montage)

将文件另存为FaceClusteringLibrary.py，其中将包含所有类定义。
以下是file Driver.py，它调用功能来创建管道。

# 从上面的Python文件导入所有类
from FaceClusteringLibrary import *
if __name__ == "__main__":
    # 从给定的视频素材生成帧
    framesGenerator = FramesGenerator("Footage.mp4")
    framesGenerator.GenerateFrames("Frames")
    # 设计并运行人脸聚类管道
    CurrentPath = os.getcwd()
    FramesDirectory = "Frames"
    FramesDirectoryPath = os.path.join(CurrentPath, FramesDirectory)
    EncodingsFolder = "Encodings"
    EncodingsFolderPath = os.path.join(CurrentPath, EncodingsFolder)
    if os.path.exists(EncodingsFolderPath):
        shutil.rmtree(EncodingsFolderPath, ignore_errors = True)
        time.sleep(0.5)
    os.makedirs(EncodingsFolderPath)
    pipeline = Pipeline(
                    FramesProvider("Files source", sourcePath = FramesDirectoryPath) |
                    FaceEncoder("Encode faces") |
                    DatastoreManager("Store encoding",
                    encodingsOutputPath = EncodingsFolderPath),
                    n_threads = 3, quiet = True)
    pbar = TqdmUpdate()
    pipeline.run(update_callback = pbar.update)
    print()
    print('[INFO] Encodings extracted')
    # 将所有编码pickle文件合并为一个
    CurrentPath = os.getcwd()
    EncodingsInputDirectory = "Encodings"
    EncodingsInputDirectoryPath = os.path.join(
          CurrentPath, EncodingsInputDirectory)
    OutputEncodingPickleFilename = "encodings.pickle"
    if os.path.exists(OutputEncodingPickleFilename):
        os.remove(OutputEncodingPickleFilename)
    picklesListCollator = PicklesListCollator(
                    EncodingsInputDirectoryPath)
    picklesListCollator.GeneratePickle(
           OutputEncodingPickleFilename)
    # 管理文件写入中的任何延迟
    time.sleep(0.5)
    # 开始聚类过程并生成带有注释的输出图像
    EncodingPickleFilePath = "encodings.pickle"
    faceClusterUtility = FaceClusterUtility(EncodingPickleFilePath)
    faceImageGenerator = FaceImageGenerator(EncodingPickleFilePath)
    labelIDs = faceClusterUtility.Cluster()
    faceImageGenerator.GenerateImages(
      labelIDs, "ClusteredFaces", "Montage")

输出：

故障排除
问题1：提取面部embeding时，整个PC卡死。
解决方案：解决方案是在从输入视频剪辑中提取帧时减小帧调整大小函数中的值。请记住，将值减小太多将导致不正确的脸部聚类。除了调整框架的大小外，我们还可以引入一些正面人脸检测并裁剪正面以提高准确性。

问题2：在运行管道时，计算机速度变慢。
解决方案：将最大程度地使用CPU。为了限制使用量，您可以减少在管道构造函数中指定的线程数。

问题3：输出聚类太不准确了。
解决方案：出现这种情况的唯一原因可能是从输入视频剪辑中提取的帧将具有非常高分辨率的人脸，或者帧数很少(大约7-8)。请获得一个带有明亮且清晰的面部图像的视频剪辑；对于后一种情况，请获得一个2分钟的视频或Mod，并带有用于提取视频帧的源代码。