边缘设备上的实时AI人员检测:使用预先训练的SSD模型检测人员（译文）

By S.F.

本文链接 https://www.kyfws.com/news/real-time-ai-person-detection-on-edge-devices-dete/

10月07日, 2020 - 7 分钟阅读 - 3481 个词 阅读量 0

边缘设备上的实时AI人员检测:使用预先训练的SSD模型检测人员（译文）

原文地址：https://www.codeproject.com/Articles/5281994/Real-time-AI-Person-Detection-on-Edge-Devices-Dete

原文作者：Sergey L. Gladkiy

译文由本站翻译

前言

In this article, we’ll showcase the Python code for launching these models and detect humans in images. 在本文中,我们将展示用于启动这些模型并检测图像中人物的Python代码. Here we’ll write the Python code for detecting persons in images using SSD models. 在这里,我们将编写用于使用SSD模型检测图像中人物的Python代码.

Download Data - 19.3 MB下载数据-19.3 MB
Download Models - 43.5 MB下载模型-43.5 MB
Download Results - 36.66 MB下载结果-36.66 MB

In the previous article of this series, we’ve selected for further work two SSD models, one based on MobileNet and another one based on SqueezeNet. In this article, we’ll develop some Python code that will enable us to detect humans in images using these models.

在本系列的上一篇文章中,我们选择了两种SSD模型进行进一步工作,一种基于MobileNet,另一种基于SqueezeNet.在本文中,我们将开发一些Python代码,使我们能够使用这些模型检测图像中的人物.

The selected DNNs are realized as Caffe models. A Caffe model consists of two parts: the model structure (.prototxt) file and the trained model (.caffemodel). The Caffe model structure is written in a format similar to JSON. The trained model is a binary serialization of the CNN kernels and other trained data. In the first article of the series, we’ve mentioned that we’ll use the Python OpenCV library with Caffe. What does it mean? Should we install both frameworks, OpenCV and Caffe? Fortunately, no, only OpenCV library. This framework includes the DNN module that directly supports network models developed with TensorFlow, Caffe, Torch, Darknet, and some others. So – lucky us! – the OpenCV framework allows working with both the computer vision algorithms and the deep neural networks. And this is all we need.

选定的DNN被实现为Caffe模型. Caffe模型由两部分组成:模型结构(.prototxt)文件和训练后的模型(.caffemodel). Caffe模型结构以类似于JSON的格式编写.训练后的模型是CNN内核和其他训练后数据的二进制序列化.在本系列的第一篇文章中,我们提到将Caffe与Python OpenCV库结合使用.这是什么意思？我们应该同时安装两个框架,OpenCV和Caffe吗？幸运的是,没有,只有OpenCV库.该框架包括DNN模块,该模块直接支持使用TensorFlow,Caffe,Torch,Darknet等开发的网络模型.所以–幸运的我们! – OpenCV框架允许同时使用计算机视觉算法和深度神经网络.这就是我们所需要的.

Let’s start our Python code with two utility classes:

让我们从两个实用程序类开始我们的Python代码:

import cv2
import numpy as np
import os
 
class CaffeModelLoader:	
    @staticmethod
    def load(proto, model):
    	  net = cv2.dnn.readNetFromCaffe(proto, model)
    	  return net
 
class FrameProcessor:	
    def __init__(self, size, scale, mean):
    	  self.size = size
    	  self.scale = scale
    	  self.mean = mean
	
    def get_blob(self, frame):
    	  img = frame
            (h, w, c) = frame.shape
    	  if w>h :
            dx = int((w-h)/2)
            img = frame[0:h, dx:dx+h]
        	
   	  resized = cv2.resize(img, (self.size, self.size), cv2.INTER_AREA)
        blob = cv2.dnn.blobFromImage(resized, self.scale, (self.size, self.size), self.mean)
        return blob

The CaffeModelLoader class has one static method to load the Caffe model from the disk. The latter FrameProcessor class aims to convert data from an image to a specific format intended for DNN. The constructor of the class receives three parameters. The size parameter defines the size of the input data for DNN processing. Convolutional networks for image processing almost always use square images as input, so we specify only one value for both width and height. The scale and mean parameters are used for scaling data to the value range that was used for training the SSD. The only method of the class is get_blob, which receives an image and returns a blob – a special structure for the neural network processing. To receive the blob, the image is first resized to the specified square. Then the blob is created from the resized image using the blobFromImage method from OpenCV’s DNN module with the specified scale, size, and mean values.

CaffeModelLoader类具有一种从磁盘加载Caffe模型的静态方法.后者的FrameProcessor类旨在将数据从图像转换为DNN的特定格式.该类的构造函数接收三个参数.参数size定义用于DNN处理的输入数据的大小.用于图像处理的卷积网络几乎总是使用正方形图像作为输入,因此我们仅为宽度和高度指定一个值. " scale"和" mean"参数用于将数据缩放到用于训练SSD的值范围.该类的唯一方法是get_blob,它接收图像并返回blob-神经网络处理的一种特殊结构.为了接收斑点,首先将图像调整大小为指定的正方形.然后使用OpenCV DNN模块中具有指定的scale,size和mean值的blobFromImage方法,从resize图片中创建斑点.

Note the code at the beginning of the get_blob method. This code implements a little “trick”: we trim the non-square images to get only the center square part of the image, as shown in the picture below:

请注意get_blob方法开头的代码.该代码实现了一些"技巧":我们修剪非正方形图像以仅获取图像的中心正方形部分,如下图所示:

This trick is intended to keep the aspect ratio of the image constant. If the width/height ratio changed, the image would get distortion, and the precision of object detection would decrease. One disadvantage of this trick is that we’ll detect persons only in the central square part of the image (shown in blue in the above picture).

此技巧旨在使图像的纵横比保持恒定.如果宽高比改变,图像将变形,并且物体检测的精度将降低.此技巧的一个缺点是,我们只会在图像的中央正方形部分(上图中以蓝色显示)检测到人员.

Let’s now have a look at the main class for person detection with SSD models:

现在让我们看一下使用SSD模型进行人检测的主要类别:

class SSD:	
    def __init__(self, frame_proc, ssd_net):
    	  self.proc = frame_proc
 	  self.net = ssd_net
	
    def detect(self, frame):
        blob = self.proc.get_blob(frame)
    	  self.net.setInput(blob)
  	  detections = self.net.forward()
    	  # detected object count
    	  k = detections.shape[2]
    	  obj_data = []
    	  for i in np.arange(0, k):
            obj = detections[0, 0, i, :]
            obj_data.append(obj)
        	
    	  return obj_data
 
    def get_object(self, frame, data):
        confidence = int(data[2]*100.0)
    	  (h, w, c) = frame.shape
    	  r_x = int(data[3]*h)
    	  r_y = int(data[4]*h)
    	  r_w = int((data[5]-data[3])*h)
    	  r_h = int((data[6]-data[4])*h)
    	
    	  if w>h :
            dx = int((w-h)/2)
            r_x = r_x+dx
    	
    	  obj_rect = (r_x, r_y, r_w, r_h)
    	
    	  return (confidence, obj_rect)
    	
    def get_objects(self, frame, obj_data, class_num, min_confidence):
        objects = []
    	  for (i, data) in enumerate(obj_data):
            obj_class = int(data[1])
        	obj_confidence = data[2]
        	if obj_class==class_num and obj_confidence>=min_confidence :
                obj = self.get_object(frame, data)
                objects.append(obj)
            	
        return objects

The constructor of the above class has two arguments: frame_proc for converting images to blobs and ssd_net to detect objects. The main method, detect, receives a frame (image) as input and gets a blob from the frame using the specified frame processor. The blob is used as input for the network, and we get the detections with the forward method. These detections are presented as a 4-rank array (tensor). We won’t analyze the entire tensor; we only need the 2nd dimension of the array. We’ll extract it from the detections and return the result – a list of object data.

上面的类的构造函数有两个参数:用于将图像转换为斑点的frame_proc和用于检测对象的ssd_net.主要方法"检测"接收帧(图像)作为输入,并使用指定的帧处理器从帧中获取斑点.斑点被用作网络的输入,并且我们使用"转发"方法获得"检测"信息.这些"检测"表示为4级数组(张量).我们不会分析整个张量;我们只需要数组的第二维.我们将从"检测"中提取它,并返回结果-对象数据列表.

The object data is a real array. Here is an example:

对象数据是一个实数数组.这是一个例子:

[array([ 0.	    , 15.    	,  0.90723044,  0.56916684,  0.6017439 ,
     	0.68543154,  0.93739873], dtype=float32)]

The array contains seven numbers:

该数组包含七个数字:

The second method of the class converts the detection data to a simpler format for further use. It converts the relative confidence to the percentage value and the relative ROI coordinates to the integer data – pixel coordinates in the original image. This method takes into account the fact that the blob data was extracted from the center square of the original frame.

该类的第二种方法将检测数据转换为更简单的格式,以备将来使用.它将相对置信度转换为百分比值,并将相对ROI坐标转换为整数数据-原始图像中的像素坐标.该方法考虑了从原始帧的中心正方形提取斑点数据的事实.

And finally, the get_objects method extracts from the detection data only the objects with the specified class and sufficient confidence. Because DNN models can detect objects of twenty different classes, we must filter the detections for the person class only to be sure that the detected object is really a human, so we specify a high confidence threshold.

最后,get_objects方法仅从检测数据中提取具有指定类和足够置信度的对象.由于DNN模型可以检测到二十种不同类别的对象,因此我们仅对" person"类的检测进行过滤,以确保检测到的对象确实是人类,因此我们指定了高置信度阈值.

One more utility class – for drawing detected objects into images to visualize the results:

另一个实用程序类–用于将检测到的对象绘制到图像中以可视化结果:

class Utils:	
    @staticmethod
    def draw_object(obj, label, color, frame):
        (confidence, (x1, y1, w, h)) =  obj
        x2 = x1+w
    	  y2 = y1+h
    	  cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
    	  y3 = y1-12
    	  text = label + " " + str(confidence)+"%"
        cv2.putText(frame, text, (x1, y3), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 1, cv2.LINE_AA)
    	
    @staticmethod
    def draw_objects(objects, label, color, frame):
        for (i, obj) in enumerate(objects):
            Utils.draw_object(obj, label, color, frame)

Now we can write the code to launch the person detection algorithm:

现在我们可以编写代码来启动人检测算法:

proto_file = r"C:\PI_RPD\mobilenet.prototxt"
model_file = r"C:\PI_RPD\mobilenet.caffemodel"
ssd_net = CaffeModelLoader.load(proto_file, model_file)
print("Caffe model loaded from: "+model_file)
 
proc_frame_size = 300
# frame processor for MobileNet
ssd_proc = FrameProcessor(proc_frame_size, 1.0/127.5, 127.5)
person_class = 15
 
ssd = SSD(ssd_proc, ssd_net)
 
im_dir = r"C:\PI_RPD\test_images"
im_name = "woman_640x480_01.png"
im_path = os.path.join(im_dir, im_name)
image = cv2.imread(im_path)
print("Image read from: "+im_path)
 
obj_data = ssd.detect(image)
persons = ssd.get_objects(image, obj_data, person_class, 0.5)
person_count = len(persons)
print("Person count on the image: "+str(person_count))
Utils.draw_objects(persons, "PERSON", (0, 0, 255), image)
 
res_dir = r"C:\PI_RPD\results"
res_path = os.path.join(res_dir, im_name)
cv2.imwrite(res_path, image)
print("Result written to: "+res_path)

The code implements a frame processor with size=300 because the models we use work with images sized 300 x 300 pixels. The scale and mean parameters have the same values that were used for the MobileNet model training. These values must always be assigned to the model’s training values, else the precision of the model decreases. The person_class value is 15 because human is the 15-th class in the model context.

该代码实现了大小为300的帧处理器,因为我们使用的模型可以处理大小为300 x 300像素的图像. " scale"和" mean"参数的值与用于MobileNet模型训练的值相同.这些值必须始终分配给模型的训练值,否则模型的精度会降低. person_class的值为15,因为在模型上下文中,human是第15类.

Running the code on sample images produces the results below:

在示例图像上运行代码会产生以下结果:

We used very simple cases to detect persons. The goal was just to check that our code worked correctly and the DNN model could predict a person’s presence in an image. And it worked!

我们使用非常简单的案例来检测人员.目的只是检查我们的代码是否正确运行,并且DNN模型可以预测人在图像中的存在.而且有效!

Next Steps(下一步)

The next step is to launch our code on a Raspberry Pi device. In the next article, we’ll see how you can install Python-OpenCV on the device and run the code.

下一步是在Raspberry Pi设备上启动我们的代码.在下一篇文章中,我们将看到如何在设备上安装Python-OpenCV并运行代码.

许可

本文以及所有相关的源代码和文件均已获得The Code Project Open License (CPOL)的许可。

Python AI neural 新闻翻译