[译]图像标记器-基于卷积神经网络的图像分类器

By robot-v1.0

本文链接 https://www.kyfws.com/ai/image-tagger-a-convolutional-neural-network-based-zh/

01月01日, 0001 - 6 分钟阅读 - 2875 个词 阅读量 0

图像标记器-基于卷积神经网络的图像分类器（译文）

原文地址：https://www.codeproject.com/Articles/1360649/Image-Tagger-A-Convolutional-Neural-Network-Based

原文作者：Huseyin Atasoy

译文由本站 robot-v1.0 翻译

前言

An image classifier / tagger based on convolutional neural networks. Now more than 10 times faster with the Intel MKL support.

基于卷积神经网络的图像分类器/标记器.现在,借助英特尔MKL支持,速度提高了10倍以上.

ImageTagger is an application that allows to search images by keywords. It determines contents of images using CeNiN.dll which is a pure C# implementation of deep convolutional neural networks. It is now more than 10 times faster when the Intel MKL libraries are available.

ImageTagger是允许通过关键字搜索图像的应用程序.它使用CeNiN.dll(深层卷积神经网络的纯C#实现)确定图像的内容.现在,当英特尔MKL库可用时,它快10倍以上.

下载ImageTagger-899.3 KB(Download ImageTagger - 899.3 KB)
下载ImageTagger_Release-894 KB(Download ImageTagger_Release - 894 KB)
下载模型文件(imagenet-matconvnet-vgg-f.cenin)-232 MB(Download model file (imagenet-matconvnet-vgg-f.cenin) - 232 MB)
CeNiN.dll GitHub页面(CeNiN.dll GitHub page) 我们将编写一个应用程序,使我们可以按关键字搜索图像.我讨厌库依赖或"黑匣子".所以我们不会使用任何3(We will write an application that will allow us to search images by keywords. I hate library dependencies or “blackbox"es. So we will not use any 3)rd(rd)派对API或库.一切都将使用纯C#简单. (现在使用CeNiN v0.2(party API or library. Everything will be in pure C# and simple. (With CeNiN v0.2, now)它快十倍以上(it is more than 10 times faster)什么时候(when) 英特尔MKL支持(Intel MKL support) 可用.)(is available.))

介绍(Introduction)

深度卷积神经网络是图像处理社区中的热门话题之一.有各种语言的不同实现.但是,如果您试图使想法背后的逻辑变得清晰,那么大型实现并不总是有帮助的.因此,我以最小形式将卷积神经网络的前馈阶段实现为.NET库.(Deep Convolutional Neural Network is one of the hot topics in the image processing community. There are different implementations in various languages. But if you are trying to get the logic behind ideas, large implementations are not always helpful. So I have implemented feed-forward phase of a convolutional neural network in its minimal form as a .NET library;)CeNiN.dll(CeNiN.dll).(.)

我们将使用CeNiN对图像进行分类并用关键字标记它们,以便我们可以在一组图像中搜索对象或场景.例如,我们将能够在我们选择的文件夹中搜索和查找包含猫,汽车或任何我们想要的东西的图像.(We will use CeNiN to classify images and tag them with keywords so that we can search an object or scene in a set of images. We will be able to, for instance, search and find images that contain cats, cars or whatever we want, in a folder that we choose.)

CeNiN不包含反向传播的实现,而后者是训练神经网络模型所需的.我们将使用预训练的模型.我们将使用的原始模型((CeNiN doesn’t contain implementation of back-propagation which is required to train a neural network model. We will use a pretrained model. The original model that we will use () imagenet-matconvnet-vgg-f ),并且可以找到与CeNiN兼容的格式相同的模型() and the same model that is in a format compatible with CeNiN can be found) 这里(here) 和(and) 这里(here) 分别.该模型包含19 + 2(输入和输出)层和60824256权重,并已针对1000类图像进行了训练…(respectively. The model contains 19+2 (input and output) layers and 60824256 weights and has been trained for 1000 classes of images…)

准备模型(Preparing the Model)

首先,我们使用构造函数加载模型.由于可能需要一段时间才能从模型文件中加载数百万个参数,因此我们在单独的线程中调用构造函数而不是阻塞UI:(First, we load the model using the constructor. Since it may take a while to load millions of parameters from the model file, we call the constructor in a separate thread not to block the UI:)

Thread t = new Thread(() =>
{
    try
    {
        cnn = new CNN("imagenet-matconvnet-vgg-f.cenin");
        ddLabel.Invoke((MethodInvoker)delegate ()
        {
            cbClasses.Items.AddRange(cnn.outputLayer.classes);
            dropToStart();
        });
    }
    catch (Exception exp)
    {
        ddLabel.Invoke((MethodInvoker)delegate ()
        {
            ddLabel.Text = "Missing model file!";
            if (MessageBox.Show(this, "Couldn't find model file. 
                Do you want to be redirected to download page?", "Missing Model File", 
                MessageBoxButtons.YesNo,MessageBoxIcon.Error) == DialogResult.Yes)
                Process.Start("http://huseyinatasoy.com/y.php?bid=71");
        });
    }
});
t.Start();

图片分类(Classifying Images)

我们需要一个结构来保持结果:(We need a structure to keep the results:)

private struct Match
{
    public int ImageIndex { set; get; }
    public string Keywords { set; get; }
    public float Probability { set; get; }
    public string ImageName { set; get; }

    public Match(int imageIndex, string keywords, float probability, string imageName)
    {
        ImageIndex = imageIndex;
        Keywords = keywords;
        Probability = probability;
        ImageName = imageName;
    }
}

CeNiN将层作为层链加载到内存中.该链是一个链表,其第一个和最后一个节点是(CeNiN loads layers into memory as a layer chain. The chain is a linked list first and last nodes of which are) Input 和(and) Output 层.为了对图像进行分类,将图像设置为输入,并迭代图层(layers. To classify an image, the image is set as input and the layers are iterated calling) feedNext() 功能在每个步骤中填充下一层.数据到达时(function to feed next layer in each step. When data arrives to) Output 层,它是概率向量的形式.呼唤(layer, it is in a form of probability vector. Calling) getDecision() 从最高到最低排序概率,然后我们可以将每个概率视为(sorts probabilities from highest to lowest and then we can consider each probability as a) Match .重要的是再次在线程内进行这些调用,不要阻塞UI.另外,由于线程无法修改UI元素,因此修改UI元素的代码(向(. It is important to make those calls inside a thread again not to block the UI. Also, since a thread cannot modify UI elements, codes that modify UI elements (adding new rows to) lv_KeywordList ,更新(, updating) ddLabel.Text )应该由GUI线程调用.() should be invoked by GUI thread.)

Thread t = new Thread(() =>
{
    int imCount = imageFullPaths.Length;
    for (int j = 0; j < imCount; j++)
    {
        Bitmap b = (Bitmap)Image.FromFile(imageFullPaths[j]);
        ddLabel.Invoke((Action<int,int>)delegate (int y, int n)
        {
            ddLabel.Text = "Processing [" + (y + 1) + "/" + n + "]...\n\n" + 
                            getImageName(imageFullPaths[y]);
        }, j, imCount);
        Application.DoEvents();

        cnn.inputLayer.setInput(b, Input.ResizingMethod.ZeroPad);
        b.Dispose();

        Layer currentLayer = cnn.inputLayer;
        while (currentLayer.nextLayer != null)
        {
            currentLayer.feedNext();
            currentLayer = currentLayer.nextLayer;
        }
        Output outputLayer = (Output)currentLayer;
        outputLayer.getDecision();

        lv_KeywordList.Invoke((MethodInvoker)delegate ()
        {
            int k = 0;
            while (outputLayer.probabilities[k] > 0.05)
            {
                Match m = new Match(
                    j,
                    outputLayer.sortedClasses[k],
                    (float)Math.Round(outputLayer.probabilities[k], 3),
                    getImageName(imageFullPaths[j])
                );
                matches.Add(m);
                k++;
            }
        });
    }

    lv_KeywordList.Invoke((MethodInvoker)delegate ()
    {
        groupBox2.Enabled = true;
        btnFilter.PerformClick();

        int k;
        for (k = 0; k < lv_KeywordList.Columns.Count - 1; k++)
            if(k!=1)
              lv_KeywordList.Columns[k].Width = -2;
        lv_KeywordList.Columns[k].Width = -1;

        dropToStart();
    });
});
t.Start();

现在,所有图像都标记有关键字,这些关键字实际上是我们正在使用的模型的类描述.最后,我们迭代(Now all the images are tagged with keywords which are actually class descriptions of the model we are using. Finally, we iterate) Match es查找每个(es to find each) Match 包含用户编写的关键字.(that contains the keyword written by the user.)

float probThresh = (float)numericUpDown1.Value;
string str = cbClasses.Text.ToLower();
lv_KeywordList.Items.Clear();
pictureBox1.Image = null;

List<int> imagesToShow = new List<int>();

int j = 0;

bool stringFilter = (str != "");

for (int i = 0; i < matches.Count; i++)
{
    bool cond = (matches[i].Probability >= probThresh);
    if (stringFilter)
        cond = cond && matches[i].Keywords.Contains(str);
    if (cond)
    {
        addMatchToList(j, matches[i]);
        int ind = matches[i].ImageIndex;
        if (!imagesToShow.Contains(ind))
            imagesToShow.Add(ind);
        j++;
    }
}
if (lv_KeywordList.Items.Count > 0)
    lv_KeywordList.Items[0].Selected = true;

就这么简单!(It is that simple!)

训练自己的ImageTagger模型(Training Your Own Models for ImageTagger)

您可以使用类似的工具来训练自己的神经网络(You can train your own neural network using a tool like) Matconvnet(matconvnet) 并将其转换为CeNiN格式以用于(and convert it to CeNiN format to use it in) ImageTagger .这是一个matlab脚本,可将vgg网络转换为与CeNiN兼容的格式:(. Here is a matlab script that converts vgg nets to a format compatible with CeNiN:)

function vgg2cenin(vggMatFile) % vgg2cenin('imagenet-matconvnet-vgg-f.mat')
  fprintf('Loading mat file...\n');
  net=load(vggMatFile);
  lc=size(net.layers,2);

  vggMatFile(find(vggMatFile=='.',1,'last'):end)=[]; % remove extension
  
  f=fopen(strcat(vggMatFile,'.cenin'),'w');   % Open an empty file with the same name
  fprintf(f,'CeNiN NEURAL NETWORK FILE');   % Header
  fwrite(f,lc,'int');             % Layer count
  if(isfield(net.meta,'inputSize'))
    s=net.meta.inputSize;
  else
    s=net.meta.inputs.size(1:3);
  end
  for i=1:length(s)
    fwrite(f,s(i),'int'); % Input dimensions (height, width and number of channels (depth))
  end
  for i=1:3
    fwrite(f,net.meta.normalization.averageImage(i),'single');
  end
  for i=1:lc % For each layer
    l=net.layers{i};
    t=l.type;
    s=length(t);
    fwrite(f,s,'int8'); % String length
    fprintf(f,t);     % Layer type (string)

    fprintf('Writing layer %d (%s)...\n',i,l.type);

    if strcmp(t,'conv') % Convolution layers     
      st=l.stride;
      p=l.pad;
      
      % We need 4 padding values for CeNiN (top, bottom, left, right)
      % In vgg format if there are one value, all padding values are
      % the same and if there are two values, these are for top-bottom
      % and left-right paddings.
      if size(st,2)<2 , st(2)=st(1); end
      if size(p,2)<2 , p(2)=p(1); end
      if size(p,2)<3 , p(3:4)=[p(1) p(2)]; end

      % Four padding values
      fwrite(f,p(1),'int8');
      fwrite(f,p(2),'int8');
      fwrite(f,p(3),'int8');
      fwrite(f,p(4),'int8');

      s=size(l.weights{1}); % Dimensions (height, width, number of channels (depth), 
                            number of filters)
      for j=1:length(s)
        fwrite(f,s(j),'int');
      end

      % Vertical and horizontal stride values (StrideY and StrideX)
      fwrite(f,st(1),'int8');
      fwrite(f,st(2),'int8');
      
      % Weight values
      % Writing each value one by one takes long time because there are many of them.
      %   for j=1:numel(l.weights{1})
      %     fwrite(f,l.weights{1}(j),'single');
      %   end
      % This is faster:
      fwrite(f,l.weights{1}(:),'single');
      
      % And biases
      %   for j=1:numel(l.weights{2})
      %     fwrite(f,l.weights{2}(j),'single');
      %   end
      fwrite(f,l.weights{2}(:),'single');

    elseif strcmp(t,'relu') % ReLu layers
      % Layer type ('relu') has been written above. There are no extra
      % parameters to be written for this layer..

    elseif strcmp(t,'pool') % Pooling layers
      st=l.stride;
      p=l.pad;
      po=l.pool;
      if size(st,2)<2 , st(2)=st(1); end
      if size(p,2)<2 , p(2)=p(1); end
      if size(p,2)<3 , p(3:4)=[p(1) p(2)]; end
      if size(po,2)<2 , po(2)=po(1); end

      % Four padding values (top, bottom, left, right)
      fwrite(f,p(1),'int8');
      fwrite(f,p(2),'int8');
      fwrite(f,p(3),'int8');
      fwrite(f,p(4),'int8');

      % Vertical and horizontal pooling values (PoolY and PoolX)
      fwrite(f,po(1),'int8');
      fwrite(f,po(2),'int8');

      % Vertical and horizontal stride values (StrideY and StrideX)
      fwrite(f,st(1),'int8');
      fwrite(f,st(2),'int8');

    elseif strcmp(t,'softmax') % SoftMax layer (this is the last layer)
      s=size(net.meta.classes.description,2);
      fwrite(f,s,'int'); % Number of classes
      for j=1:size(net.meta.classes.description,2) % For each class description
        s=size(net.meta.classes.description{j},2);
        fwrite(f,s,'int8'); % String length
        fprintf(f,'%s',net.meta.classes.description{j}); % Class description (string)
      end
    end

  end

  fwrite(f,3,'int8'); % Length of "EOF" as if it is a layer type.
  fprintf(f,'EOF');   % And the "EOF" string itself...
  fclose(f);

end

有用的链接(Useful Links)

cuDNN:深度学习的有效原始方法(cuDNN: Efficient Primitives for Deep Learning)
预训练模型(Pretrained models) (它们与CeNiN不直接兼容)((they are not directly compatible with CeNiN))

历史(History)

3(3)rd(rd)2019年4月:初始版本(April, 2019: Initial version)

许可

本文以及所有相关的源代码和文件均已获得The Code Project Open License (CPOL)的许可。

C# .NET MatLab image-processing neural-nets AI 新闻翻译