[译]DARL和Whitebox机器学习

By robot-v1.0

本文链接 https://www.kyfws.com/ai/darl-and-whitebox-machine-learning-zh/

01月01日, 0001 - 9 分钟阅读 - 4427 个词 阅读量 0

DARL和Whitebox机器学习（译文）

原文地址：https://www.codeproject.com/Articles/1248572/DARL-and-Whitebox-Machine-Learning

原文作者：AndyEdmonds

译文由本站 robot-v1.0 翻译

前言

Use the online free system to create machine learning models you can understand

使用在线免费系统创建您可以理解的机器学习模型

介绍(Introduction)

这是一个延续(This is a continuation of a) 先前的文章描述了DARL语言及其在模糊逻辑中的基础.(previous article that described the DARL language and its basis in Fuzzy Logic.) 在那篇文章中,我讨论了机器学习中"黑匣子"模型的问题,以及最初如何将DARL技术开发为可以机器学习并理解的模型.出于明显的原因,有时将此类算法标记为"白盒".(In that article, I talked about the problem of “blackbox” models in Machine Learning, and how the DARL technology was initially developed to behave as a model you can machine learn to, that you can also understand. Such algorithms are sometimes labelled “whitebox” for obvious reasons.)

的(The) 达爱(DARL.AI) 现在,该网站为您提供使用监督学习,模糊逻辑规则归纳算法的API访问机器学习的权限.该接口是免费的.描述了REST接口(website now gives you API access to machine learning using a supervised learning, fuzzy logic rule induction algorithm. This interface is free. The REST interface is described) 这里(here) .(.)

机器学习可能需要大量处理器,具体取决于训练数据的数量.为了分散负载,该处理由Azure函数执行,该函数响应队列并通过电子邮件返回结果.(Machine learning can be processor intensive, depending on the amount of training data. To spread out the load, the processing is performed by an Azure function responding to a queue and returning results via email.)

我构建了一个非常简单的项目,该项目将访问此Web服务并启动三个基准机器学习示例.(I’ve constructed a very simple project that will access this web service and fire off three benchmark machine learning examples.)

背景(Background)

机器学习分为几种类型.最常见的是监督学习.在这里,您收集了要学习的某些过程的输入和输出的多个示例,记录在数据库中,或者记录了诸如XML或Json之类的某些代码表示形式,并且机器学习算法尝试创建一个模型,该模型可以在与输入一起呈现.(Machine learning falls into several types. The most common is supervised learning. This is where you have collected multiple examples of the inputs and outputs of some process you want to learn, recorded in a database, or some code representation like XML or Json, and the machine learning algorithm tries to create a model that reproduces the outputs when presented with the inputs.)

如果您还记得上一篇文章,则DARL输入和输出可以是文本,分类,数字或时间的.这里的机器学习仅限于分类和数字输入和输出.学习一次仅限于单个输出.如果该输出是分类的,则(If you remember from the previous article, DARL inputs and outputs can be Textual, Categorical, Numeric or Temporal. Machine learning here is limited to categorical and numeric inputs and outputs. Learning is limited to a single output at a time. If that output is categorical then)**分类(classification)**已经执行,如果是数字则(has been performed, if numeric then)预测(prediction).(.)

用于训练数据的数据是(The data used to train the data is a)训练集(training set),而某些数据可能会放在一边以形成(, and some of the data may be put aside to form a)测试集(test set).使用这种机器学习算法,您可以指定要训练的数据的百分比,系统会将数据随机分为两组.(. With this machine learning algorithm, you specify the percentage of the data to train on and the system will randomly split the data into two groups.)

尽管有时会使用具有现有解析解决方案的问题来测试ML算法,例如获取模型以复制某些逻辑关系,但在现实世界中,没有人在他们的正确想法下会使用机器学习算法来学习针对其进行解析的内容模型像方程式一样存在.当没有其他效果时,将使用机器学习算法.通常,这是在要解决的问题是嘈杂,指定不明确或短暂的情况下.(Although problems that have an existing analytic solution are sometimes used to test ML algorithms, for instance getting a model to copy some logical relationship, in the real world no one in their right mind would use a machine learning algorithm to learn something for which an analytic model, like an equation, exists. Machine learning algorithms are used when nothing else will work. This is typically when the problem to be solved is noisy, poorly specified or ephemeral.)

机器学习很少绝对正确.在现实世界中的所有情况下,您都必须处理一些不准确性.这可能是分类错误或某些预测错误.您的输入也很可能与输出没有任何可辨别的关联,因此模型性能很差.(Machine learning is seldom absolutely correct. In all real world situations, you will have to deal with some inaccuracy. This might be misclassification or some prediction error. It is also entirely possible that your inputs are not related in any discernable way to your outputs, so the model performance will be poor.)

要使用DARK机器学习服务,您需要做一些事情:(To use the DARL Machine learning service, you need several things:)

一种数据源,每个模式具有1个或多个输入值,并且要分类或预测一个输出值.所需的模式数量取决于问题,但通常为» 50.(A source of data with 1 or more input values per pattern and one output value to be classified or predicted. The number of patterns required is problem dependant, but is normally » 50.)
在DARL中创建的规则集框架,用于指定输入和输出以及如何在数据中查找它们.对于XML(XPath)和Json(Jsonpath),它由一个用于查找模式的表达式以及与该模式相关的表达式来查找每个数据项.(A ruleset skeleton created in DARL that specifies the inputs and output and how to find them in the data. For XML (XPath) and Json (Jsonpath), this consists of an expression to find the patterns, and expressions relative to the pattern that find each data item.)
从集合3.5.7.9中选择模糊集合计数,如果存在数字输入或输出,则较大的数字将导致更复杂的模型.(A choice of fuzzy set count from the set 3.5.7.9 where larger numbers result in more complex models if numeric inputs or output are present.)
在要训练的数据百分比的1-100范围内选择.如果选择的数字小于100,那么将随机选择该百分比的数据作为训练集,其余的将成为测试集.在这种情况下,结果将报告两组数据的性能.(A choice in the range 1-100 of the percentage of the data to train on. If a number less than 100 is chosen, then that percentage of the data is randomly chosen as a training set, and the rest becomes test set. In this case, the performance of both sets is reported in the results.)
结果应该发送到的电子邮件地址.短期和垃圾邮件地址被过滤掉.(An email address that the results should be sent to. Short term and spam email addresses are filtered out.)

使用代码(Using the Code)

该项目(The project) 这里(here) GitHub上的包含用于访问Web服务的示例代码.(on GitHub contains the example code for accessing the web service.)

提供了三个示例,每个示例都由嵌入在可执行文件中的XML数据文件和darl框架组成.(There are three examples provided, each consisting of a data file in XML and a darl skeleton, embedded in the executable.)

class Program
{
    static string destEmail = "support@darl.ai"; //put your email address here!
    static void Main(string[] args)
    {
        DarlML("yingyang").Wait();
        DarlML("iris").Wait();
        DarlML("cleveheart").Wait();
    }

    static async Task DarlML(string examplename)
    {
        var reader = new StreamReader(Assembly.GetExecutingAssembly().
        GetManifestResourceStream($"DarlMLRestExample.{examplename}.darl"));
        var source = reader.ReadToEnd();
        reader = new StreamReader(Assembly.GetExecutingAssembly().
        GetManifestResourceStream($"DarlMLRestExample.{examplename}.xml"));
        var data = reader.ReadToEnd();
        var spec = new DarlMLData { code = source, data = data,
                                    email = destEmail, percentTrain = 100,
        sets = 5, jobName = examplename};//use your own choice of training percent (1-100)
                                         //and sets, (3,5,7,9)
        var valueString = JsonConvert.SerializeObject(spec);
        var client = new HttpClient();
        var response = await client.PostAsync("https://darl.ai/api/Linter/DarlML",
        new StringContent(valueString, Encoding.UTF8, "application/json"));
        //check for errors here...
    }
}

不要忘记用您自己的电子邮件地址替换!(Don’t forget to replace the email address with your own!)

封装ML规范的类如下所示:(The class encapsulating the ML specification looks like this:)

public class DarlMLData
{
    /// <summary>
    /// Your DARL code
    /// </summary>
    /// <remarks>Should contain a single ruleset decorated with "supervised"
    /// containing only I/O. Outside of the ruleset the "pattern" parameter should be specified,
    /// along with mapinputs,mapoutputs and wires. MAP I/O should have paths.</remarks>
    public string code { get; set; }

    /// <summary>
    /// The training data
    /// </summary>
    /// <remarks>this can be in XML or Json. If the former XPath should be used
    /// to specify paths in the ruleset. If the latter, JsonPath</remarks>
    public string data { get; set; }

    /// <summary>
    /// Number of sets to use for numeric variables. Only values 3,5,7 and 9 can be specified.
    /// </summary>
    [Range(3,9)]
    public int sets { get; set; }

    /// <summary>
    /// The percent to train on
    /// </summary>
    /// <remarks>Must be between 1 and 100</remarks>
    [Range(1, 100)]
    public int percentTrain { get; set; }

    /// <summary>
    /// email to send results
    /// </summary>
    /// <remarks>Because Machine learning can be CPU intensive training is performed
    /// via a queue in a secondary process.
    /// Results and the mined DARL will be emailed to this address.
    /// </remarks>
    [DataType(DataType.EmailAddress)]
    public string email { get; set; }

    /// <summary>
    /// A name to identify the job in the returned email
    /// </summary>
    public string jobName { get; set; }
}

示例数据集(The Example Data Sets)

提供的三个数据集展示了不同种类数据的分类.系统也可以处理数字输出.(The three data sets provided demonstrate classification of different kinds of data. The system can handle numeric outputs too.)

Iris是Fisher的Iris数据集.机器学习中经常使用的现实世界数据集.它包含3种鸢尾花的测量值,每种都有50个示例.任务是从测量中学习种类(品种).(Iris is Fisher’s Iris data set. A real world data set used frequently in machine learning. It contains the measurements of 3 kinds of Iris flower, with 50 examples each. The task is to learn the kind (cultivar) from the measurements.)
CleveHeart是克利夫兰心脏数据库,其中包含心脏病发作后到达医院急诊室的患者的测量数据.任务是根据各种测量结果来预测结果-生存率.(CleveHeart is the cleveland heart database containing measurements of patients who arrived at an A&E unit in a hospital with a heart attack. The task is to predict the outcome - survival - based on the various measurements.)
YingYang是一个综合数据集,包含由缠绕的Ying Yang符号形成的两个类别.形状中的点的坐标与类别一起提供,并且系统必须学习将它们分开.使用7或9个模糊集.(YingYang is a synthetic data set containing two categories formed from the entwined Ying Yang symbol. The coordinates of points within the shapes are provided along with the category, and the system has to learn to separate them. Use 7 or 9 fuzzy sets.) 为了说明如何构造darl骨架,我们将看一下Iris示例的数据.(To illustrate how to construct a darl skeleton, we’ll look at the data of the Iris example.)

<?xml version = "1.0"?>
<irisdata>
 <Iris>
  <sepal_length>5.10</sepal_length>
  <sepal_width>3.50</sepal_width>
  <petal_length>1.40</petal_length>
  <petal_width>0.20</petal_width>
  <class>Iris-setosa</class>
 </Iris>

这是150种模式中的一种.(This is one pattern out of 150.)

Iris DARL骨架如下所示:(The Iris DARL skeleton looks like this:)

pattern "//Iris";

ruleset iris supervised
{
 input numeric petal_length;
 input numeric sepal_length;
 input numeric petal_width;
 input numeric sepal_width;

 output categorical class;
}
 
mapinput petal_length "petal_length";
mapinput petal_width "petal_width"; 
mapinput sepal_length "sepal_length";
mapinput sepal_width "sepal_width";

mapoutput class "class";

wire petal_length iris.petal_length;
wire petal_width iris.petal_width;
wire sepal_length iris.sepal_length;
wire sepal_width iris.sepal_width;
wire iris.class class;

这部分(The section) pattern "//Iris"; 定义XPath(因为数据是XML)以查找数据中的所有模式.(defines the XPath (since the data is XML) to find all the patterns in the data.)

在规则集中,用注释(Within the ruleset, which is annotated with) supervised ,指定输入和输出.(, the inputs and output are specified.)

最后,(Finally,) mapinput 和(and) mapoutput 定义与XPath绑定在一起,以查找相对于模式的数据项.在这种情况下,这只是XML中数据项的名称.(definitions are tied to the XPath to find the data item relative to the pattern. This happens in this case to just be the name of the data item in the XML.)

的(The) Wire 元素链接(elements link the) MapInput 和(and) Mapoutput 规则集的元素.(elements to the ruleset.)

运行示例时,您将收到3封电子邮件.假设您保留了相同的参数,则Iris电子邮件将如下所示:(When you run the example, you will receive 3 emails back. The Iris email will look like this, assuming you’ve kept the same parameters:)

Darl机器学习结果于6/15/2018 10:31:32 AM为虹膜(Darl Machine Learning results at 6/15/2018 10:31:32 AM for iris)

Job id: 7f29976a-b094-426d-b8f0-763f4abf1bb5
Training on 100%
Train performance 96.0526315789474(%/RMS Error)
Unknown responses 0%
Thanks for using DARL Machine Learning. DARL.AI Support.
If you would like to unsubscribe and stop receiving these emails, click here

随附的DARL代码如下所示:(and the included DARL code will look like this:)

pattern "//Iris";

ruleset iris supervised
{
 // Generated by DARL rule induction on  6/15/2018 10:31:34 AM.
// Train correct:  96.05% on 152 patterns.
// Percentage of unknown responses over all patterns: 0.00
input numeric petal_length { {very_small, -Infinity,1,1.6},{small, 1,1.6,4.4},
{medium, 1.6,4.4,5.1},{large, 4.4,5.1,6.9},{very_large, 5.1,6.9,Infinity}};
input numeric petal_width { {very_small, -Infinity,0.1,0.3},{small, 0.1,0.3,1.3},
{medium, 0.3,1.3,1.8},{large, 1.3,1.8,2.5},{very_large, 1.8,2.5,Infinity}};
input numeric sepal_length { {very_small, -Infinity,4.3,5.1},{small, 4.3,5.1,5.8},
{medium, 5.1,5.8,6.4},{large, 5.8,6.4,7.9},{very_large, 6.4,7.9,Infinity}};
input numeric sepal_width { {very_small, -Infinity,2,2.8},{small, 2,2.8,3},
{medium, 2.8,3,3.3},{large, 3,3.3,4.4},{very_large, 3.3,4.4,Infinity}};

output categorical class {"Iris-setosa","Iris-versicolor","Iris-virginica"};

if petal_length is very_small  then class will be "Iris-setosa" confidence 1; // examples: 4
if petal_length is small  then class will be "Iris-setosa" confidence 1; // examples: 39
if petal_length is medium  then class will be "Iris-versicolor" 
                     confidence 0.977272727272727; // examples: 44
if petal_length is large  and petal_width is medium  and sepal_length is medium 
                     then class will be "Iris-virginica" confidence 1; // examples: 1
if petal_length is large  and petal_width is medium  and sepal_length is large  
                     then class will be "Iris-versicolor" confidence 0.75; // examples: 4
if petal_length is large  and petal_width is large  then class will be "Iris-virginica" 
                     confidence 0.888888888888889; // examples: 27
if petal_length is large  and petal_width is very_large  then class will be 
                     "Iris-virginica" confidence 1; // examples: 13
if petal_length is very_large  then class will be "Iris-virginica" confidence 1; // examples: 9
}
 
mapinput petal_length "petal_length";
mapinput petal_width "petal_width"; 
mapinput sepal_length "sepal_length";
mapinput sepal_width "sepal_width";

mapoutput class "class";

wire petal_length iris.petal_length;
wire petal_width iris.petal_width;
wire sepal_length iris.sepal_length;
wire sepal_width iris.sepal_width;
wire iris.class class;

注意,输入和输出现在用模糊集和类别注释.这些将在数据中发现并自动插入.(Note that the inputs and outputs are now annotated with fuzzy sets and categories. These are discovered in the data and inserted automatically.)

现在,规则集包含一组对Irises进行分类的DARL规则.(The Ruleset now contains a set of DARL rules that categorize Irises.)

注释中包含有关规则推理过程和每个规则的支持程度的其他信息.(Extra information about the rule inference process and the degree of support for each rule are included as comments.)

您获得的DARL规则集可以与上一篇文章中指定的在线推理REST API一起使用,因此这就是您重用Learned规则集的方式.(The DARL rulesets you get back can be used with the online inference REST API as specified in the previous article, so that’s how you re-use Learned rule sets.)

机器学习是一个非常大的主题.我不希望在这里告诉你一切.请看(Machine learning is a very big subject. I can’t hope to tell you everything here. Please look at the) DARL机器学习帮助可提供更多建议.(DARL Machine learning help for more advice.)

请通过DARL.AI页面上的DARK支持按钮报告所有错误,尤其是异常.(Please report any bugs, especially exceptions, through the DARL support button on the DARL.AI pages.)

历史(History)

2018/06/15:第一版(06/15/2018: First version)

许可

本文以及所有相关的源代码和文件均已获得The Code Project Open License (CPOL)的许可。

C# fuzzy ML AI 新闻翻译