使用 OpenCV 和 Tesseract OCR 进行车牌识别

您将了解自动车牌识别。我们将使用 Tesseract OCR An Optical 字符 Recognition Engine (OCR Engine) 自动识别车牌中的文字。

Python 正方体：
Py-tesseract 是用于Python的光学字符识别 (OCR) 工具。也就是说，它将识别并“读取”图像中嵌入的文本。 Python-tesseract 是 Google 的 Tesseract-OCR 引擎的包装器。它也被用作单独的脚本，因为它可以读取所有图像类型，如 jpeg、png、gif、bmp、tiff 等。此外，如果用作脚本，Python-tesseract 将打印识别的文本而不是将其写入一份文件。它有能力识别100多种语言。

安装：

pip install pytesseract

开放式简历：
OpenCV 是一个开源的计算机视觉库。该库有超过 2500 种优化算法。这些算法通常用于搜索和识别人脸、识别物体、识别风景以及使用增强现实生成标记以覆盖图像等。

安装：

pip install opencv-python

注意：确保您正确安装了 pytesseract 和 OpenCV-python 模块
注意：您应该准备好数据集，并且所有图像都应如下图所示在图像处理技术中以获得最佳性能；数据集文件夹应与您编写此Python代码的文件夹位于同一文件夹中，否则您必须在需要时手动指定数据集的路径。

程序：

# Loading the required python modules
import pytesseract # this is tesseract module
import matplotlib.pyplot as plt
import cv2 # this is opencv module
import glob
import os

Note: the name of image files has to be the exact number in respective license plate image. example: if you have a with license plate having number as “FTY349U” then name the image file as “FTY349U.jpg”.

编程需要懂一点英语

代码：在车牌上使用 Tesseract 引擎执行 OCR

# specify path to the license plate images folder as shown below
path_for_license_plates = os.getcwd() + "/license-plates/**/*.jpg"
list_license_plates = []
predicted_license_plates = []
  
for path_to_license_plate in glob.glob(path_for_license_plates, recursive = True):
      
    license_plate_file = path_to_license_plate.split("/")[-1]
    license_plate, _ = os.path.splitext(license_plate_file)
    '''
    Here we append the actual license plate to a list
    '''
    list_license_plates.append(license_plate)
      
    '''
    Read each license plate image file using openCV
    '''
    img = cv2.imread(path_to_license_plate)
      
    '''
    We then pass each license plate image file
    to the Tesseract OCR engine using the Python library 
    wrapper for it. We get back predicted_result for 
    license plate. We append the predicted_result in a
    list and compare it with the original the license plate
    '''
    predicted_result = pytesseract.image_to_string(img, lang ='eng',
    config ='--oem 3 --psm 6 -c tessedit_char_whitelist = ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
      
    filter_predicted_result = "".join(predicted_result.split()).replace(":", "").replace("-", "")
    predicted_license_plates.append(filter_predicted_result)

现在我们已经预测了车牌，但我们还没有看到预测是什么，所以为了查看数据和预测，我们做了一些可视化，如下所示。我们也在不使用任何内置函数的情况下计算预测的准确性。

print("Actual License Plate", "\t", "Predicted License Plate", "\t", "Accuracy")
print("--------------------", "\t", "-----------------------", "\t", "--------")
  
def calculate_predicted_accuracy(actual_list, predicted_list):
    for actual_plate, predict_plate in zip(actual_list, predicted_list):
        accuracy = "0 %"
        num_matches = 0
        if actual_plate == predict_plate:
            accuracy = "100 %"
        else:
            if len(actual_plate) == len(predict_plate):
                for a, p in zip(actual_plate, predict_plate):
                    if a == p:
                        num_matches += 1
                accuracy = str(round((num_matches / len(actual_plate)), 2) * 100)
                accuracy += "%"
        print("     ", actual_plate, "\t\t\t", predict_plate, "\t\t  ", accuracy)
  
          
calculate_predicted_accuracy(list_license_plates, predicted_license_plates)

输出：

我们看到，Tesseract OCR 引擎大多数情况下都能以 100% 的准确率正确预测所有车牌。对于 Tesseract OCR 引擎预测错误的车牌（即 GWT2180、OKV8004、JSQ1413），我们将对这些车牌文件应用图像处理技术，并将它们再次传递给 Tesseract OCR。应用图像处理技术将提高 Tesseract 引擎对 GWT2180、OKV8004、JSQ1413 车牌的准确性。

代码：图像处理技术

# Read the license plate file and display it
test_license_plate = cv2.imread(os.getcwd() + "/license-plates / GWT2180.jpg")
plt.imshow(test_license_plate)
plt.axis('off')
plt.title('GWT2180 license plate')

输出：

图像大小调整：

使用 cv2.resize 在水平和垂直方向上将图像文件的大小调整为 2 倍

resize_test_license_plate = cv2.resize(
    test_license_plate, None, fx = 2, fy = 2, 
    interpolation = cv2.INTER_CUBIC)

转换为灰度：接下来，我们将调整大小的图像文件转换为灰度，以优化检测并大幅减少图像中存在的颜色数量，这将有助于轻松检测车牌。
```
grayscale_resize_test_license_plate = cv2.cvtColor(
    resize_test_license_plate, cv2.COLOR_BGR2GRAY)
```

去噪图像：
高斯模糊是一种对图像进行去噪的技术。它使边缘更清晰，更平滑，从而使字符更具可读性。

gaussian_blur_license_plate = cv2.GaussianBlur(
    grayscale_resize_test_license_plate, (5, 5), 0)

现在，将转换后的车牌文件传递给 Tesseract OCR 引擎并查看预测结果。

new_predicted_result_GWT2180 = pytesseract.image_to_string(gaussian_blur_license_plate, lang ='eng',
config ='--oem 3 -l eng --psm 6 -c tessedit_char_whitelist = ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
filter_new_predicted_result_GWT2180 = "".join(new_predicted_result_GWT2180.split()).replace(":", "").replace("-", "")
print(filter_new_predicted_result_GWT2180)

输出：

GWT2180

同样，对所有其他未达到 100% 准确度的车牌进行此图像处理。最后，车牌检测模型就准备好了。

在评论中写代码？请使用 ide.geeksforgeeks.org，生成链接并在此处分享链接。