📜  如何使用Google合作实验室进行视频处理

📅  最后修改于: 2021-04-16 08:40:23             🧑  作者: Mango


特别是,我们讨论了如何配置Google Colaboratory来通过机器学习来解决视频处理任务。您将学习如何使用此Google服务和免费的NVIDIA Tesla K80 GPU,该服务可实现您在训练神经网络方面的目标。对于熟悉机器学习并考虑使用图像识别和视频处理的人员,本文将非常有用。







import os, sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt
os.chdir("/content/drive/My Drive/Colab Notebooks/MRCNN_pure")
sys.path.append("/content/drive/My Drive/Colab Notebooks/MRCNN_pure")
# Root directory of the project
ROOT_DIR = os.path.abspath(".")
# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import COCO config
sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))  # To find local version
import coco
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR, "images")
class InferenceConfig(coco.CocoConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
config = InferenceConfig()
# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)
# COCO Class names
# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
               'bus', 'train', 'truck', 'boat', 'traffic light',
               'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
               'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
               'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
               'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
               'kite', 'baseball bat', 'baseball glove', 'skateboard',
               'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
               'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
               'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
               'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
               'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
               'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
               'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
               'teddy bear', 'hair drier', 'toothbrush']
# Load a random image from the images folder
file_names = next(os.walk(IMAGE_DIR))[2]
image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))
# Run detection
results = model.detect([image], verbose=1)
# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], 
                            r['class_ids'],class_names, r['scores'])

在此示例中,/ content / drive /我的驱动器/ Colab笔记本/ MRCNN_pure是使用Mask_R-CNN到我们的存储库的路径。结果,我们得到了以下内容:


# Load a random image from the images folder
file_names = next(os.walk(IMAGE_DIR))[2]
for file_name in file_names:
    image = skimage.io.imread(os.path.join(IMAGE_DIR, file_name))
    # Run detection
    results = model.detect([image], verbose = 1)
    # Visualize results
    r = results[0]
    visualize.display_instances(image, r['rois'], r['masks'],
                         r['class_ids'],class_names, r['scores'])


最初,我们在具有Intel Core i5和8GB RAM且没有独立显卡的计算机上运行演示代码。代码每次都在不同的位置崩溃,但最常见的是在内存分配期间在TensorFlow框架中崩溃。此外,在图像识别过程中尝试运行任何其他软件的任何尝试都会使计算机的速度降低到无用的程度。


我们决定通过使用Google的Colaboratory服务(也称为Colab)来扩展硬件资源。 Google Colab是一项免费的云服务,可使用CPU和GPU以及预配置的虚拟机实例。具体来说,Google提供了具有12GB专用视频内存的NVIDIA Tesla K80 GPU,这使Colab成为进行神经网络实验的理想工具。



–支持Python 2.7和Python 3.6,以便您提高编码技能;
–具有Jupyter Notebook的功能,因此您可以创建,编辑和共享.ipynb文件;
–许多预安装的库,包括TensorFlow,Keras和OpenCV,以及与Google Colaboratory中的自定义库进行交互的可能性;
–可以在您的Google云端硬盘中存储Google Colab笔记本。

为了开始使用Google Colab GPU,您只需要提供对在Docker容器中实现的.ipynb脚本的访问。 Docker容器仅分配给您12个小时。默认情况下,您创建的所有脚本都会存储在Google云端硬盘的Colab笔记本部分中,该部分会在您连接到Colaboratory时自动创建。 12小时后,容器中的所有数据将被删除。您可以通过将Google云端硬盘安装在容器中并使用它来避免这种情况。否则,Docker映像的文件系统将仅在有限的时间内可用。

配置Google Colab

首先,让我们解释一下如何创建您的.ipynb笔记本。在此处打开Google合作实验室,选择Google云端硬盘部分,然后点击新建PYTHON 3笔记本:




现在,您可以将Google云端硬盘安装到此容器中,以便重新定位源代码并将工作结果保存在该容器中。为此,只需将下面的代码复制到第一个表格单元格中,然后按“播放”按钮(或Shift + Enter)。


# from google.colab import drive

您将收到授权请求。单击链接,授权,复制验证码,将其粘贴到.ipynb脚本的文本框中,然后按Enter。如果授权成功,您的Google云端硬盘将安装在/ content / drive /我的云端硬盘路径下。要遵循文件树,请在左侧菜单中选择“文件”。

现在,您有了一个带有Tesla K80 GPU的Docker容器,您的Google Drive作为文件存储以及用于脚本执行的.ipynb笔记本。

使用Google Colab进行对象识别

现在,我们将描述如何在Google Colab中运行Mask_R-CNN示例进行对象识别。我们沿着/ content / drive / My Drive / Colab Notebooks /路径将Mask_RCNN存储库上传到我们的Google云端硬盘。


os.chdir("/content/drive/My Drive/Colab Notebooks/MRCNN_pure")
sys.path.append("/content/drive/My Drive/Colab Notebooks/MRCNN_pure")



使用Google Colab中的对象检测,我们可以快速接收到带有已识别对象的结果,而即使在图像识别过程中,我们的计算机仍可以像往常一样执行。

使用Google Colab进行视频处理



import cv2
VIDEO_STREAM = "/content/drive/My Drive/Colab Notebooks/Millery.avi"
VIDEO_STREAM_OUT = "/content/drive/My Drive/Colab Notebooks/Result.avi"
# initialize the video stream and pointer to output video file
vs = cv2.VideoCapture(VIDEO_STREAM)
writer = None
vs.set(cv2.CAP_PROP_POS_FRAMES, 1000);

然后,使用我们的神经网络模型处理20,000帧。 OpenCV对象允许我们使用read()方法从视频文件中逐帧获取图像。接收到的图像将传递到model.detect()方法,并使用visualize.display_instances()函数将结果可视化。


def display_instances(image, boxes, masks, ids, names, scores):
        take the image and results and apply the mask, box, and Label
    n_instances = boxes.shape[0]
    colors = visualize.random_colors(n_instances)
    if not n_instances:
        print('NO INSTANCES TO DISPLAY')
        assert boxes.shape[0] == masks.shape[-1] == ids.shape[0]
    for i, color in enumerate(colors):
        if not np.any(boxes[i]):
        y1, x1, y2, x2 = boxes[i]
        label = names[ids[i]]
        score = scores[i] if scores is not None else None
        caption = '{} {:.2f}'.format(label, score) if score else label
        mask = masks[:, :, i]
        image = visualize.apply_mask(image, mask, color)
        image = cv2.rectangle(image, (x1, y1), (x2, y2), color, 2)
        image = cv2.putText(
            image, caption, (x1, y1), cv2.FONT_HERSHEY_COMPLEX, 0.7, color, 2
    return image


fourcc = cv2.VideoWriter_fourcc(*"XVID")
writer = cv2.VideoWriter(VIDEO_STREAM_OUT, fourcc, 30, 
             (masked_frame.shape[1], masked_frame.shape[0]), True)





from google.colab import drive
import os, sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt
import cv2
from matplotlib.patches import Polygon
os.chdir("/content/drive/My Drive/Colab Notebooks/MRCNN_pure")
sys.path.append("/content/drive/My Drive/Colab Notebooks/MRCNN_pure")
VIDEO_STREAM = "/content/drive/My Drive/Colab Notebooks/Millery.avi"
VIDEO_STREAM_OUT = "/content/drive/My Drive/Colab Notebooks/Result.avi"
# Root directory of the project
ROOT_DIR = os.path.abspath(".")
# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import COCO config
sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))  # To find local version
import coco
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR, "images")
class InferenceConfig(coco.CocoConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
def display_instances(image, boxes, masks, ids, names, scores):
        take the image and results and apply the mask, box, and Label
    n_instances = boxes.shape[0]
    colors = visualize.random_colors(n_instances)
    if not n_instances:
        print('NO INSTANCES TO DISPLAY')
        assert boxes.shape[0] == masks.shape[-1] == ids.shape[0]
    for i, color in enumerate(colors):
        if not np.any(boxes[i]):
        y1, x1, y2, x2 = boxes[i]
        label = names[ids[i]]
        score = scores[i] if scores is not None else None
        caption = '{} {:.2f}'.format(label, score) if score else label
        mask = masks[:, :, i]
        image = visualize.apply_mask(image, mask, color)
        image = cv2.rectangle(image, (x1, y1), (x2, y2), color, 2)
        image = cv2.putText(
            image, caption, (x1, y1), cv2.FONT_HERSHEY_COMPLEX, 0.7, color, 2
    return image
config = InferenceConfig()
# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)
# COCO Class names
# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
               'bus', 'train', 'truck', 'boat', 'traffic light',
               'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
               'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
               'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
               'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
               'kite', 'baseball bat', 'baseball glove', 'skateboard',
               'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
               'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
               'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
               'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
               'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
               'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
               'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
               'teddy bear', 'hair drier', 'toothbrush']
# Initialize the video stream and pointer to output video file
vs = cv2.VideoCapture(VIDEO_STREAM)
writer = None
vs.set(cv2.CAP_PROP_POS_FRAMES, 1000);
i = 0
while i < 20000:
  # read the next frame from the file
  (grabbed, frame) = vs.read()
  i += 1
  # If the frame was not grabbed, then we have reached the end
  # of the stream
  if not grabbed:
    print ("Not grabbed.")
  # Run detection
  results = model.detect([frame], verbose=1)
  # Visualize results
  r = results[0]
  masked_frame = display_instances(frame, r['rois'], r['masks'], r['class_ids'],
                            class_names, r['scores'])
  # Check if the video writer is None
  if writer is None:
    # Initialize our video writer
    fourcc = cv2.VideoWriter_fourcc(*"XVID")
    writer = cv2.VideoWriter(VIDEO_STREAM_OUT, fourcc, 30,
      (masked_frame.shape[1], masked_frame.shape[0]), True)
  # Write the output frame to disk
# Release the file pointers
print("[INFO] cleaning up...")


在本文中,我们向您展示了如何利用Google Colab,并说明了如何执行以下操作:

使用Google Colab提供的免费Tesla K80 GPU
使用Mask_RCNN神经网络和Google Colab对图像进行分类
使用Mask_RCNN,Google Colab和OpenCV库对视频流中的对象进行分类