边缘设备上的实时AI人员检测:选择深度学习模型（译文）

By S.F.

本文链接 https://www.kyfws.com/news/real-time-ai-person-detection-on-edge-devices-sele/

10月06日, 2020 - 6 分钟阅读 - 2624 个词 阅读量 0

边缘设备上的实时AI人员检测:选择深度学习模型（译文）

原文地址：https://www.codeproject.com/Articles/5281993/Real-time-AI-Person-Detection-on-Edge-Devices-Sele

原文作者：Sergey L. Gladkiy

译文由本站翻译

前言

In this article, we’ll discuss the pros and cons of the existing DNN approaches and select a pre-trained model for further experimentation. 在本文中,我们将讨论现有DNN方法的优缺点,并选择一种经过预先训练的模型以进行进一步的实验. Here we’ll select(for further work) two SSD models, one based on MobileNet and another one based on SqueezeNet. 在这里,我们将选择(为进一步工作)两种SSD模型,一种基于MobileNet,另一种基于SqueezeNet.

Download Data - 19.3 MB (下载数据-19.3 MB)
Download Models - 43.5 MB (下载模型-43.5 MB)
Download Results - 36.66 MB (下载结果-36.66 MB)

In the introductory article of this series, we discussed a simple way of creating a DL person detector for edge devices, which was finding an appropriate DNN model and writing the code for launching it on a device. In this article, we’ll discuss the pros and cons of the existing DNN approaches and select a pre-trained model for further experimentation.

在本系列的介绍性文章中,我们讨论了为边缘设备创建DL人员检测器的简单方法,该方法是找到合适的DNN模型并编写代码以在设备上启动它.在本文中,我们将讨论现有DNN方法的优缺点,并选择一种经过预先训练的模型以进行进一步的实验.

We’ve mentioned three modern DL techniques for object detection in images: Faster-RCNN, Single-Shot Detector (SSD), and You Only Look Once (YOLO). Each of these techniques has advantages and drawbacks we should take into account in order to select the one that best suits our specific purpose.

我们已经提到了三种用于图像中对象检测的现代DL技术:更快的RCNN,单发检测器(SSD)和"只看一次"(YOLO).为了选择最适合我们特定目的的技术,我们应该考虑每种技术的优缺点.

Faster-RCNN uses a convolutional neural network along with the Region-Proposal block and fully connected (FC) layers. CNN is the first block of the network; its job is to extract features from the image. The following block – Region-Proposal network – is responsible for suggesting Regions-of-Interest (ROI) for possible object locations. The final block – the FC layers – is intended for bounding box (BB) regression and object classification for each of the boxes. Here is a simple scheme of the Faster-RCNN algorithm:

Faster-RCNN使用卷积神经网络以及Region-Proposal块和完全连接(FC)层. CNN是网络的第一块;它的工作是从图像中提取特征.接下来的块-区域提案网络-负责建议可能的对象位置的兴趣区域(ROI).最后一个块-FC层-用于边界框(BB)回归和每个框的对象分类.这是Faster-RCNN算法的简单方案:

The SSD method is similar to Faster-RCNN. After the CNN feature extractor, it contains a Multi-Box detector, which allows bounding box detection and object classification in a single forward pass. This is why it is considered to be faster than Faster-RCNN.

SSD方法类似于Faster-RCNN.在CNN特征提取器之后,它包含一个Multi-Box检测器,该检测器允许在单个前向传递中进行边界框检测和对象分类.这就是为什么它被认为比Faster-RCNN更快.

The YOLO technique is based on the Darknet framework. Instead of scanning an image over the different locations and scales, it divides the entire image into a grid of cells and analyses each cell, scoring each cell’s probability of belonging to a certain class. This makes the YOLO algorithm very fast.

YOLO技术基于Darknet框架.它无需将图像扫描到不同的位置和比例,而是将整个图像划分为一个单元格网格并分析每个单元格,从而对每个单元格属于某个类别的概率进行评分.这使得YOLO算法非常快.

Consideration of the speed alone would make YOLO the obvious choice. But there is one more thing to consider before we make the decision – the algorithm precision. The concept of precision for object detection is more complex than that for object classification. Here, we must evaluate not only the classification error, but also the error of the object’s bounding box location.

仅考虑速度将使YOLO成为显而易见的选择.但是在做出决定之前,还有另外一件事要考虑–算法精度.用于对象检测的精度概念比用于对象分类的精度概念更为复杂.在这里,我们不仅要评估分类错误,还要评估对象边界框位置的错误.

The main precision measure for object detection is Intersection over Union (IoU) – the ratio of intersection and union of ground-truth and the detected bounding boxes. Because we can have many classes to locate and classify, the mean Average Precision (mAP) is used to compute the accuracy for the entire dataset. The mAP value is commonly evaluated for IoU=0.5, and is denoted as “mAP@0.5."

物体检测的主要精确度是"交集相交”(IoU)–地面真相与检测到的边界框的交集和并合的比率.由于我们可以有许多类来定位和分类,因此均值平均精度(mAP)用于计算整个数据集的精度.通常针对IoU =0.5评估mAP值,并表示为" mAP@0.5".

We won’t go too deep into the theory of measurements of the object detection precision. We’ll just compare the three competing DL methods in terms of mAP@0.5: the higher this value is, the more precise is the model.

我们不会深入探讨物体检测精度的测量理论.我们将按照mAP@0.5比较三种竞争的DL方法:此值越高,模型越精确.

Search the Internet for precision values of the various Faster-RCNN, SSD, and YOLO models – and you will find a lot of drastically different results. This is because there are many pre-trained DNN models for each detection method. For example, the SSD technology can use the different CNN models for feature extraction, and each of these models can be trained with the different datasets: ImageNet, COCO, VOC, and others.

在Internet上搜索各种Faster-RCNN,SSD和YOLO模型的精度值–您会发现很多截然不同的结果.这是因为每种检测方法都有许多预训练的DNN模型.例如,SSD技术可以使用不同的CNN模型进行特征提取,并且可以使用不同的数据集(包括ImageNet,COCO,VOC等)训练这些模型中的每一个.

A complete comparison of all existing models’ precision metrics is not our goal. Our testing of the three methods with the same dataset showed that the best precision could be achieved with the Faster-RCNN method, slightly lower precision – with the SSD models, and the least precise results – with the YOLO network.

我们不打算将所有现有模型的精度指标进行完整比较.我们使用相同的数据集对这三种方法进行了测试,结果表明,使用Faster-RCNN方法可以获得最佳的精度,使用SSD模型可以达到较低的精度,而使用YOLO网络则可以达到最低的精度.

It appears that the faster the object detection method is, the less precision it provides. Thus we choose the happy medium – the SSD models, which can provide enough speed with sufficient precision.

看来,物体检测方法越快,其提供的精度就越低.因此,我们选择快乐的介质-SSD模型,该模型可以提供足够的速度和足够的精度.

Next, we looked for a ready-to-use SSD model that would be:

接下来,我们寻找一种现成的SSD模型,该模型应为:

We’ve found two suitable SSD models. The first one was based on the MobileNet CNN, and the second used a pre-trained SqueezeNet as the feature extractor. Both models have been trained using the COCO dataset and could detect objects of twenty different classes, including humans, cars, dogs, and others. Each model was about 20 MB in size – very small compared to the common size of 200-500 MB for models used on high-performance processors. So these two looked like the right choice for our purposes.

我们找到了两种合适的SSD型号.第一个基于MobileNet CNN,第二个基于预先训练的[SqueezeNet](https://github.com/chuanqi305/SqueezeNet -SSD)作为特征提取器.两种模型都使用COCO数据集进行了训练,可以检测到二十种不同类别的物体,包括人,汽车,狗和其他物体.每个型号的大小约为20 MB,与用于高性能处理器的型号的普通大小200-500 MB相比非常小.因此,这两个看起来像是实现我们目的的正确选择.

Next Steps(下一步)

In the next article, we’ll showcase the Python code for launching these models and detect humans in images.

在下一篇文章中,我们将展示用于启动这些模型并检测图像中人物的Python代码.

许可

本文以及所有相关的源代码和文件均已获得The Code Project Open License (CPOL)的许可。

Python AI Raspberry 新闻翻译