[译]多卷积神经网络在线手写识别方法

By robot-v1.0

本文链接 https://www.kyfws.com/ai/multiple-convolution-neural-networks-approach-for-zh/

01月01日, 0001 - 12 分钟阅读 - 5659 个词 阅读量 0

多卷积神经网络在线手写识别方法（译文）

原文地址：https://www.codeproject.com/Articles/571462/Multiple-convolution-neural-networks-approach-for

原文作者：Vietdungiitb

译文由本站 robot-v1.0 翻译

前言

The research focuses on the presentation of word recognition technique for an online handwriting recognition system which uses multiple component neural networks (MCNN) as the exchangeable parts of the classifier.

该研究集中于在线手写识别系统中的单词识别技术的介绍,该系统使用多分量神经网络(MCNN)作为分类器的可交换部分.

抽象(Abtract)

本文所述的研究重点是针对使用多成分神经网络(MCNN)作为分类器可交换部分的在线手写识别系统的单词识别技术的介绍.作为最新的方法,该系统通过将手写单词分割成较小的部分(通常是字符)进行区分,这些部分可以单独识别.识别结果就是各个识别部分的组成.它们被发送到单词识别模块的输入端,然后通过应用一些字典搜索算法来选择最佳的单词识别模块.提出的分类器克服了传统分类器对大人物分类的障碍和困难.此外,提出的分类器还具有可扩展的能力,可以通过动态添加或更改组件网络和内置词典来识别其他字符类.(The research described in this paper focuses on the presentation of word recognition technique for an online handwriting recognition system which uses multiple component neural networks (MCNN) as the exchangeable parts of the classifier. As the most of recent approaches, the system proceeds by segmenting handwriting words into smaller pieces (usually characters) which are recognized separately. The recognition results are then the composition of the individually recognized parts. They are sent to the input of a word recognition module in turn to choose the best one by applying some dictionary search algorithms. The proposed classifier overcomes obstacles and difficulties of traditional ones to big character classes. Furthermore, the proposed classifier also has expandable capacity which can recognize another character classes by adding or changing component networks and built-in dictionaries dynamically.)

介绍(Introduction)

如今,触摸用户界面(TUI)变得越来越流行,并将在人机交互中发挥重要作用.平板电脑,智能手机和TUI计算机接受基于手指或钢笔的输入已成为许多人必不可少的部分.使用手指或钢笔作为输入设备会取代常规鼠标和键盘的许多功能.钢笔优于鼠标的一个主要优点是,钢笔是自然的书写工具,而鼠标用作书写工具时却非常麻烦.但是,它需要将手写文本可靠地转换为可以由计算机直接处理的编码,例如ASCII.传统的转换模型通常包括一个预处理程序,该预处理程序从图像或输入屏幕中提取每个单词并将其分为多个部分.然后,神经网络分类器在给定段的情况下找到每种可能的字符类别的可能性.这些可能性用作识别整个单词的特殊算法的输入.近年来,手写识别的研究已经发展到可以进行商业应用的水平.然而,这种单一神经网络分类器的显着缺点是大型网络组织的复杂性和可扩展的容量.(Now a day, touch user interfaces (TUI) are becoming increasingly popular and will play an important role in human-computer interaction. Tablets, smartphones and TUI computers accepting finger or pen based input are becoming an indispensable part of many persons. Using fingers or a pen as an input device takes over many functions of conventional mouse and keyboard. One major advantage of the pen over the mouse is the fact that a pen is a natural writing tool while the mouse is very cumbersome when used as a writing tool. However, it needs a reliable transformation of handwritten text into a coding that can be directly processed by a computer, e.g., ASCII. A traditional transformation model usually includes a preprocessor which extracts each word from image or input screen and divides it into segments. A neural network classifier then finds the likelihoods of each possible character class given the segments. These likelihoods are used as the input to a special algorithm which recognizes the entire word. In recent years, research in handwriting recognition has advanced to a level that makes commercial applications. Nevertheless, significant disadvantages of such single neural network classifiers are complexity in big network organization and expandable capacity.)

可以建立一个高可靠的识别率神经网络,易于识别小字符类别,但不能识别大字符类别.较大的输入和输出使神经网络的层,神经元,连接数增加.因此,这给网络训练过程带来了更多困难,尤其是识别率应大大降低.此外,单个神经网络分类器仅适用于特定的字符类别.在不重新创建或重新训练神经网络的情况下,它不能交换和/或扩展以识别其他字符类.(A high reliable recognition rate neural network can be built easy to recognize a small character class but not to big ones. The larger inputs and outputs make increasing of the neural network’s layers, neurons, connections. Hence, it makes more difficulties to network training process and especially the recognition rate should be significantly decreased. Furthermore, a single neural network classifier only works to a particular character class. It is not exchangeable and or expandable to recognize additional character classes without recreating or retraining the neural network.)

本文提出了一种基于多重卷积神经网络(CNN)的新型在线手写识别系统.与传统的单个神经网络分类器不同,新分类器包含了一组非常高识别率的CNN一起工作.每个CNN只能正确识别大字符类别的一部分(数字,字母等),但是当这些网络通过编程算法组合时,它们可以创建一个灵活的分类器,该分类器可以通过简单地添加或删除组件来识别不同的大字符类别CNN和语言词典.<o:p>(This paper presents a new online handwriting recognition system that based on multiple convolutional neural networks (CNNs). Unlike the traditional single neural network classifiers, the new one includes a collection of very high recognition rate component CNNs that work together. Each CNN only recognize correctly to a part of the big character class (digits, alphabet, etc.), but when these networks are combined by programing algorithms they can create a flexible classifier which can recognize differential big character classes by simply adding or removing component CNNs and language dictionaries.<o:p>)

卷积神经网络(Convolution neural network)

卷积神经网络(CNN)是一种特殊的多层神经网络.像几乎所有其他神经网络一样,它们都使用反向传播算法的版本进行训练.它们的不同之处在于体系结构.卷积神经网络旨在通过最少的预处理直接从像素图像识别视觉模式.他们可以识别出具有极大可变性的图案(例如手写字符),并且对变形和简单的几何变换具有鲁棒性.(Convolutional Neural Networks (CNNs) are a special kind of multi-layer neural networks. Like almost every other neural networks they are trained with a version of the back-propagation algorithm. Where they differ is in the architecture. Convolutional Neural Networks are designed to recognize visual patterns directly from pixel images with minimal preprocessing. They can recognize patterns with extreme variability (such as handwritten characters), and with robustness to distortions and simple geometric transformations.)

图.1.(Fig. 1.)典型的卷积神经网络(LeNET 5)[1](A Typical Convolutional Neural Network (LeNET 5)[1])

用于手写数字识别的卷积神经网络LeNET 5为MNIST数据集提供了高达99%的可靠识别率.输入层的大小为32 x32,并接收包含要识别的数字的灰度图像.像素强度在-1和+1之间归一化.第一隐藏层C1包含六个特征图,每个特征图具有25个权重,构成5x5(The convolutional neural network LeNET 5 for handwritten digit recognition has granted reliable recognition rate up to 99% to MNIST dataset. The input layer is of size 32 x32 and receives the gray-level image containing the digit to recognize. The pixel intensities are normalized between −1 and +1. The first hidden layer C1 consists six feature maps each having 25 weights, constituting a 5x5) 可训练的(trainable) 内核,还有偏见.通过将输入层与相应内核进行卷积并应用激活函数来获得结果,从而计算出要素图的值.要素地图的所有值都必须共享相同的可训练内核或相同的权重值.由于边框的影响,要素地图的尺寸为28x28,小于输入图层.<o:p>(kernel, and a bias. The values of the feature map are computed by convolving the input layer with respective kernel and applying an activation function to get the results. All values of the feature map are constrained to share the same trainable kernel or the same weights values. Because of the border effects, the feature maps’ size is 28x28, smaller than the input layer.<o:p>)

每个卷积层后面都有一个子采样层,该子采样层将各个卷积层特征图的尺寸减小了两倍.因此,隐藏层S2的子采样图的大小为14x14.同样,C3层有16个大小为10x10的卷积图,S4层有16个大小为5x5的子采样图.这些功能的实现与C1和S2层完全相同. S4图层的要素图的大小为5x5,对于第三个卷积层来说太小了.该神经网络的C1到S4层可以视为可训练的特征提取器.然后,将可训练的分类器以3个完全连接的层(通用分类器)的形式添加到特征提取器中.(Each convolution layer is followed by a sub-sampling layer which reduces the dimension of the respective convolution layer’s feature maps by factor two. Hence the sub-sampling maps of the hidden layer S2 are of size 14x14. Similarly, layer C3 has 16 convolution maps of size 10x10 and layer S4 has 16 sub-sampling maps of size 5x5. The functions are implemented exactly as same as the layer C1 and S2 perform. The S4 layer’s feature maps are of size 5x5 which is too small for a third convolution layer. The C1 to S4 layers of this neural network can be viewed as a trainable feature extractor. Then, a trainable classifier is added to the feature extractor, in the form of 3 fully connected layers (a universal classifier).<o:p>)

图2.(Fig. 2.)基于Dr.Patrice Simard模型的卷积网络(A convolution network based on Dr. Partrice Simard’s model)

.该神经网络的前两层可以视为可训练的特征提取器.然后,将可训练的分类器以2个完全连接的层的形式添加到特征提取器((. The first two layers of this neural network can be viewed as atrainable feature extractor. Then, a trainable classifier is added to the feature extractor, in the form of 2 fully connected layers (a) ## 多分量神经网络分类器(Multiple component neural networks classifiers)

卷积神经网络的识别率对于数字或英文字母(26个字符)这样的小字符类来说确实很高.但是,创建一个可以可靠地识别更大集合(62个字符)的更大的神经网络仍然是一个挑战.找到一个优化且足够大的网络变得更加困难,通过大输入模式训练网络需要更长的时间.网络的收敛语音速度较慢,尤其是由于较大的不良书写字符,相似和易混淆的字符等,准确性降低了许多.<o:p>(Recognition rate of a convolution neural network is really high to small character classes such as digits or English alphabet (26 characters). However, creating a larger neural network that can recognize reliably a bigger collection (62 characters) is still a challenge. Finding an optimized and large enough network becomes more difficult, training network by large input patterns takes much longer time. Convergent speech of the network is slower and especially, the accuracy rate is significant decrease because bigger bad written characters, similar and confusable characters etc.<o:p>)

针对上述问题的建议解决方案是通过多个较小的网络代替一个独特的复杂神经网络,这些较小的网络对这些输出集具有较高的识别率.每个组件网络在官方输出集(数字,字母…)旁边还有一个附加的未知输出(未知字符).这意味着,如果输入模式未被识别为官方输出字符,它将被理解为未知字符.(The proposed solution to the above problems is taking place of a unique complex neural network by multiple smaller networks which have high recognition rate to these own output sets. Each component network has an additional unknown output (unknown character) beside the official output sets (digit, letters…). It means that if the input pattern is not recognized as a character of official outputs it will be understand as an unknown character.)

图3. MCNNs在线手写识别系统(Fig. 3. A MCNNs online handwriting recognition system)

分类器的字符识别模块是多个组件神经网络的集合,这些神经网络同时对输入模式起作用.手写单词通过分割成孤立的字符视觉模式进行预处理[].然后将这些模式提供给所有组件神经网络的输入,这些神经网络将识别每个自己的字符类别的可能性.一种视觉模式可以被一个,一些或所有组件网络识别,因为在差分类中有几个相似的字符.如果网络无法将模式识别为自己的字符类别的可能性,它将返回一个未知字符(空字符).模块的输出结果是一个可能字符表,由可能的词组成,例如(Character recognition module of the classifier is a collection of multiple component neural networks which work simultaneously to the input patterns. A handwritten word is pre-processed by segmenting into isolated character visual patterns []. These patterns then are given to the inputs of all component neural networks which will recognize likelihoods of each own character class. A visual pattern can be recognized by one, some or all component networks because there are several similar characters in differential classes. If a network cannot recognize the pattern as a likelihood of its own character class, it will return an unknown character (null character). The module’s output result is a table of possible characters which is composed to possible words such as)**" Exper1,Expert,ExperJ,EXper1,EXpert,EXperJ"(“Exper1, Expert, ExperJ, EXper1, EXpert, EXperJ”)在上面的例子中(in the above example).(.)单词构成中不使用未知字符(空字符).然后将这些单词提供给下一个单词识别模块,以选择最正确的单词成为整体分类器的输出.在这个例子中(Unknown characters (null characters) are not used in word composition. These words then are given to next word recognition module in turn to choose the most corrected one becoming the output of overall classifier. In this example the)“专家”(“Expert”)**单词将被选择.(word will be chosen.)

图4.(Fig. 4.)MCNNs分类器模块的输出(Output of MCNNs classifier module)

全局变量:(Global variables:)

charMatrix =List <List > {{E},{x,X},{p},{e},{r},{1,t,J}}//字符表(charMatrix = List<List> {{E},{x,X},{p},{e},{r},{1,t,J}}// character table)
words =List <字符串>//组成词的列表.(words =List //list of composed word.)
startIndex:默认为0(startIndex: default is 0)
baseWord:默认为"(baseWord: default is “)

void GetWords(int startIndex, String baseWord)
      {
          String newWord = "";
          if (startIndex == charMatrix.Count - 1)
          {
              for (int i = 0; i < charMatrix[startIndex].Count; i++)
              {
                  newWord = String.Format("{0}{1}", baseWord, charMatrix[startIndex][i].ToString());
                  words.Add(newWord);
              }
          }
          else
          {
              for (int i = 0; i < charMatrix[startIndex].Count; i++)
              {
                  newWord = String.Format("{0}{1}", baseWord, charMatrix[startIndex][i].ToString());
                  int newIndex = startIndex + 1;
                  GetWords(newIndex, newWord);
              }
          }
      }

****单词识别模块实际上是一个拼写检查器,它使用几种字典搜索算法和单词更正技术来获得最佳含义的单词.来自字符识别模块的所有可能单词都将顺序提供给字典搜索.如果在内置词典中找到其中一个单词,它将是分类器的输出单词.否则,将使用某些单词校正技术来在自动模式下选择最正确的单词或在手动模式下向用户显示相似单词的列表.其中一些技术是:(he word recognition module is in fact a spell checker which uses several dictionary search algorithms and word corrections techniques to get the best meaning word. All possible words from character recognition module are given to the dictionary search sequentially. If one of the words is found in built-in dictionaries it will be the output word of classifier. Otherwise, some word correction techniques will be applied for choosing the most corrected word in automatic mode or showing a list of similar words to user in manual mode. Some of these techniques are:)

逐个交换每个字符,并尝试替换所有字符,看看是否是一个好单词.(swap out each char one by one and try all the chars in its place to see if that makes a good word.)

private bool ReplaceChars(String word, out String result)
      {
          result = "";
          bool isFoundWord = false;
          foreach (WordDictionary dictionary in Dictionaries)
          {
              ArrayList replacementChars = dictionary.ReplaceCharacters;
              for (int i = 0; i < replacementChars.Count; i++)
              {
                  int split = ((string)replacementChars[i]).IndexOf(' ');
                  string key = ((string)replacementChars[i]).Substring(0, split);
                  string replacement = ((string)replacementChars[i]).Substring(split + 1);
                  int pos = word.IndexOf(key);
                  while (pos > -1)
                  {
                      string tempWord = word.Substring(0, pos);
                      tempWord += replacement;
                      tempWord += word.Substring(pos + key.Length);
                      if (this.TestWord(tempWord))
                      {
                          result = tempWord.ToString();
                          isFoundWord = true;
                          return isFoundWord;
                      }
                      pos = word.IndexOf(key, pos + 1);
                  }
              }
          }
          return isFoundWord;
      }

尝试一一交换相邻的字符.(try swapping adjacent chars one by one.)

private bool SwapChar(String word, out String result)
     {
         result = "";
         bool isFoundWord = false;
         foreach (WordDictionary dictionary in Dictionaries)
         {
             for (int i = 0; i < word.Length - 1; i++)
             {
                 StringBuilder tempWord = new StringBuilder(word);
                 char swap = tempWord[i];
                 tempWord[i] = tempWord[i + 1];
                 tempWord[i + 1] = swap;
                 if (this.TestWord(tempWord.ToString()))
                 {
                     result = tempWord.ToString();
                     isFoundWord = true;
                     return isFoundWord;
                 }
             }
         }
         return isFoundWord;
     }

尝试一次省略一个字符.(try omitting one char of word at a time.)
尝试在每个字母前插入一个新字符.(try inserting a new character before every letter.)

private bool ForgotChar(String word, out String result)
       {
           result = "";
           bool isFoundWord = false;
           foreach (WordDictionary dictionary in Dictionaries)
           {
               char[] tryme = dictionary.TryCharacters.ToCharArray();
               for (int i = 0; i <= word.Length; i++)
               {
                   for (int x = 0; x < tryme.Length; x++)
                   {
                       StringBuilder tempWord = new StringBuilder(word);
                       tempWord.Insert(i, tryme[x]);
                       if (this.TestWord(tempWord.ToString()))
                       {
                           result = tempWord.ToString();
                           isFoundWord = true;
                           return isFoundWord;
                       }
                   }
               }
           }
           return isFoundWord;
       }

每个字符后将字符串分成两部分.如果两个词都是好话,请给他们一个建议等.(split the string into two pieces after every char. If both pieces are good words make them a suggestion etc.)

private bool TwoWords(String word, out String result)
      {
          result = "";
          bool isFoundWord = false;
          for (int i = 1; i < word.Length - 1; i++)
          {
              string firstWord = word.Substring(0, i);
              string secondWord = word.Substring(i);
              if (this.TestWord(firstWord) && this.TestWord(secondWord))
              {
                  string tempWord = firstWord + " " + secondWord;
                  result = tempWord;
                  isFoundWord = true;
                  return isFoundWord;
              }
          }
          return isFoundWord;
      }

通过在拼写检查器中同时使用多个差异语言词典,如果存在可以识别这些语言的字符类别的组件神经网络,则建议的分类器可以正确识别差异语言.(By using multiple differential language dictionaries simultaneously in the spell checker, the proposed classifier can recognize correctly differential languages if there are component neural networks that can recognize these languages’ character classes.)

public NNTestingControl()
       {
           InitializeComponent();
           bitmap = null;
           networks = null;
           textSpellControl1.SpellChecker = this.multipleSpelling;
           //English dicionary
           multipleSpelling.Dictionaries.Add(this.wordDictionary1);
           //France dictionary
           //multipleSpelling.Dictionaries.Add(this.wordDictionary2);
           //Italian dictionary
           //multipleSpelling.Dictionaries.Add(this.wordDictionary3);

       }

实验与结果(Experiments and results)

该演示使用三个组成部分的CNN来识别62个英文字符类.它对我自己的字画样本可以得到很高的识别率.我确实希望这个项目可以帮助想要研究手写识别的任何人.目前,我没有时间继续它,但是我希望有人将其开发成一个好的opensoure项目.这是我以前所有文章的完整源代码.可以找到该项目的所有信息(The demo uses three component CNNs to recognize 62 English characters class. It can get high recognition rate to my own word drawing samples. I do hope this project can help anyone want to study on handwrting recognition. At present I do not have time to continue it, but I hope someone will develop it to a good opensoure project. This is the full sourcecode of all my previous articles. All information of this project can be found) 这里.(here.)

历史(History)

2013年1月4日:更新一些图片(01/04/2013: update some pictures)

许可

本文以及所有相关的源代码和文件均已获得The Code Project Open License (CPOL)的许可。

C# .NET Dev neural network 新闻翻译