[译]构建简单的AI .NET库-第6部分-ML算法
By robot-v1.0
本文链接 https://www.kyfws.com/ai/build-simple-ai-net-library-part-ml-algorithms-zh/
版权声明 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
- 23 分钟阅读 - 11032 个词 阅读量 0构建简单的AI .NET库-第6部分-ML算法(译文)
原文地址:https://www.codeproject.com/Articles/1207728/Build-Simple-AI-NET-Library-Part-ML-Algorithms
原文作者:Gamil Yassin
译文由本站 robot-v1.0 翻译
前言
This is a series of articles demonstrating .NET AI library from scratch
这是一系列文章,从头开始演示.NET AI库
系列介绍(Series Introduction)
以下是前几篇文章的链接:(Here are the links for previous articles:)
- 构建简单的AI .NET库-第1部分-基础知识(Build Simple AI .NET Library - Part 1 - Basics First)
- 构建简单的AI .NET库-第2部分-机器学习简介(Build Simple AI .NET Library - Part 2 - Machine Learning Introduction)
- 构建简单的AI .NET库-第3部分-Perceptron(Build Simple AI .NET Library - Part 3 - Perceptron)
- 构建简单的AI .NET库-第4部分-Perceptron及其他(Build Simple AI .NET Library - Part 4 - Perceptron & Beyond)
- 构建简单的AI .NET库-第5部分-人工神经网络(Build Simple AI .NET Library - Part 5 - Artificial Neural Networks) 我的目标是创建一个简单的AI库,其中涵盖了几个高级AI主题,例如遗传算法,ANN,模糊逻辑和其他进化算法.完成本系列文章的唯一挑战是有足够的时间来编写代码和文章.(My objective is to create a simple AI library that covers a couple of advanced AI topics such as Genetic algorithms, ANN, Fuzzy logics and other evolutionary algorithms. The only challenge to complete this series would be having enough time working on code and articles.)
但是,拥有代码本身可能不是主要目标,因此了解这些算法才是主要目标.希望有一天对某人有用.(Having the code itself might not be the main target however, understanding these algorithms is. Wish it will be useful to someone someday.)
请随时发表评论并要求任何澄清,或者希望提出更好的方法.(Please feel free to comment and ask for any clarifications or hopefully suggest better approaches.)
文章简介-第6部分" ML算法"(Article Introduction - Part 6 “ML Algorithms”)
我们在文章#2中讨论了ML(机器学习)的类型(We have discussed types of ML (Machine Learning) in article #2)(20)((20))(有监督,无监督和加强),此分类基于"((Supervised, unsupervised & reinforced) this classification is based on “) The learning approach
主要.但是,这还不足以定义"如何"准确进行学习的本质.(” mainly. However, this is not enough to define the essence of “How” exactly learning is conducted.)
机器学习算法的另一个分类是基于该函数的,到目前为止,我们已经介绍了机器学习的几个方面,例如回归和分类.在本文中,我们针对主要的机器学习算法,以其解决的功能或问题类型为目标.(Another classification for ML algorithms are based on the function is there, so far we have introduced couple of ML areas such as regression and classification. In this article, we are targeting main ML algorithms based on function or type of problems they solve.)
注意,出于简单性和时间限制;仅显示算法概念和最终方程式,不提供统计或数学证明.(As a note, for simplicity and time constraints; only algorithms concepts and final equations will be shown and no statistics or mathematical proofs shall be provided.)
什么是ML算法?(What is ML Algorithm?)
ML算法是用于实际提取给定数据集的信息和/或检测模式的工具和技术.(ML Algorithms are tools and techniques used to actually extract information for given data set and/or detect patterns.)
为什么我们需要这些算法?显然,这将满足AI作为概念的学习部分.(Why we need these algorithms? Clearly, this shall satisfy the learning portion of AI as a concept.)
在以下部分将介绍(In following section will introduce) Regression
,(,) Classification
和(&) Clustering
1-回归算法(1-Regression Algorithms)
引用:(Quote:)回归(Regression)(10)((10))很简单查找输入和输出之间的最佳匹配映射(或关系)(is simply; to find the best-fit mapping (or relationship) between inputs and output(s))
它代表了非常古老的统计研究领域,并根据回归类型开发了许多分析方法.(It represents very old statistical field of study with many analytical approaches are developed based on type of regression.)
回归分析是预测建模技术的一种形式,它研究因变量(目标或标签)与自变量(输入或预测变量)之间的关系.该技术用于预测,时间序列建模以及查找变量之间的因果关系.(Regression analysis is a form of predictive modelling technique, which investigates the relationship between a dependent (target or label) and independent variable (input or predictor). This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables.)
回归分析是用于建模和分析数据的重要工具.在这里,我们将曲线/直线拟合到数据点,以使数据点到曲线或直线的距离之间的差异最小.(Regression analysis is an important tool for modelling and analyzing data. Here, we fit a curve / line to the data points, in such a manner that the differences between the distances of data points from the curve or line is minimized.)
回归技术的类型(Types of Regression techniques)
一组变量之间的关系可以适合不同的模型,最简单的是线性模型,其中直线可以最适合(在2D空间中)或平面(在3D空间中)或在高D空间中的超平面(Relationship between a group of variables can fit different models, simplest is Linear model where straight line could be best-fit (in 2D space) or plane (in 3D space) or Hyperplane in higher-D spaces)
线性回归(2D)(Linear regression (2D))
非线性回归(2D)(Non-Linear regression (2D))
线性回归(3D)(Linear regression (3D))
最受欢迎的回归算法是:(The most popular regression algorithms are:)
- 线性回归(Linear Regression)
- 逻辑回归(Logistic Regression)
- 普通最小二乘回归(OLSR)(Ordinary Least Squares Regression (OLSR))
- 逐步回归(Stepwise Regression)
- 多元自适应回归样条(MARS)(Multivariate Adaptive Regression Splines (MARS))
- 局部估计的散点图平滑(LOESS)(Locally Estimated Scatterplot Smoothing (LOESS))
2.分类(2. Classification)
分类也是X输入和输出Y之间的映射函数.其中Y可能是类.(Classification is a mapping function as well between X inputs and output(s) Y. Where Y is possible of classes.)
如果回归是在输入和输出之间找到最佳拟合线,则分类是在寻找将输入分为一个或多个类别的最佳区分线(If regression is finding best-fit line between inputs and output, classification is finding best-discrimination line that separates inputs to one or multiple classes)
如果类数为2,则称为(If number of classes is two it is called) Binary classification
如果有更多的课程,则称为(If more classes, it’s called) multiclass classification
作为回归,它提供了基于训练数据集的预测方法,因此分类属于监督ML方法.(As regression, it provides a prediction means based on training data set, so classification belongs to Supervised ML approach.)
分类算法示例:(Examples of classification algorithms:)
- 线性分类器(Linear classifiers)
- 逻辑回归(Logistic regression)
- 朴素贝叶斯分类器(Naive Bayes classifier)
- 感知器(Perceptron)
- 支持向量机(Support vector machines)
- 二次分类器(Quadratic classifiers)
- k最近邻居(k-nearest neighbor)
- 决策树(Decision trees)
- 神经网络(Neural networks)
- 学习矢量量化(Learning vector quantization)
3.集群(3. Clustering)
聚类是分组的(类似于分类),但是没有预定义的标签(没有类).显然,它是无监督ML类型算法的一部分.(Clustering is grouping (similar to classification) but with no pre-defined labels (no classes). Clearly, it is part of unsupervised ML type of algorithms.)
要将对象分为不同的组而不使用预定义的类,一种方法是根据距离或要素进行分组.实际上,不同的聚类算法使用不同的方法.(To group objects into different groups without having pre-defined classes, one way is to group based on distances or features. Actually different clustering algorithms use different approaches.)
在上图中,对象(输入)可以分为3个(基于距离或要素集中的其他信息)(In above graph, objects (inputs) could be grouped into 3 (based on distance or other information within feature set))
基于相似度的聚类有三种类型(There are three types of clustering based on similarity)
- 独家集群(Exclusive clustering) 一个对象仅属于一组(One object belongs to one group only)
二.重叠集群(II. Overlapping clustering)
每个对象可以属于一个或多个组(集群)(Each object can belong to one group (cluster) or more)
三,层次聚类(III. Hierarchical clustering)
它更多地是作为树型关系.两个或更多群集可能具有父群集(It is more as tree-type relation. Two or more clusters may have parent cluster)
聚类算法的类型(Types of clustering algorithms)
对对象进行分组可以基于给定输入的不同相似性度量.以下是最常见的相似性度量:(Grouping objects could be based on different similarity measures on given inputs. Here are most common similarity measures:)
连接模型(Connectivity models)(21)((21))
AKA层次聚类,方法很简单;除了距离远一点以外,物体与附近物体的关系更大.(AKA hierarchical clustering, approach is simple that; object is more related to near objects other than further ones in distance.)
重心模型(Centroid models)(22)((22))
AKA K-Means聚类算法,它是一种流行的算法.最后必须预先提到所需的聚类,这使得对数据集有先验知识非常重要.这些模型迭代运行以找到局部最优值.(AKA K-Means clustering algorithm, it is a popular algorithm that, the no. of clusters required at the end have to be mentioned beforehand, which makes it important to have prior knowledge of the dataset. These models run iteratively to find the local optima.)
根据与群集质心的距离对对象进行分组(群集).(Objects are grouped (clustered) based on distance from cluster centroid.)
分配模型(Distribution models)
这些聚类模型基于以下概念:聚类中的所有数据点都属于同一分布(例如:正态,高斯).这些模型的一个流行示例是使用多元正态分布的期望最大化算法.(These clustering models are based on the notion of how probable is it that all data points in the cluster belong to the same distribution (For example: Normal, Gaussian). A popular example of these models is Expectation-maximization algorithm which uses multivariate normal distributions.)
密度模型(Density Models)
这些模型在数据空间中搜索数据空间中数据点密度不同的区域.它隔离了各种不同的密度区域,并在同一群集中的这些区域内分配了数据点.密度模型的流行示例是DBSCAN和OPTICS.(These models search the data space for areas of varied density of data points in the data space. It isolates various different density regions and assign the data points within these regions in the same cluster. Popular examples of density models are DBSCAN and OPTICS.)
这是ML类型的摘要(Here is a summary of ML types)
通用机器学习算法(Common ML Algorithms)
1-线性回归(1- Linear Regression)
线性回归,其中直线(或平面或超平面)可以是数据集之间的最佳拟合映射.(Linear regression, where a straight line (or plan or hyperplane) can be best-fit mapping between data set.)
数据集是不同输入数据和相关标签(实际输出)的样本,在这种情况下,输入称为(Data set is a sample of different input(s) data and associated labels (actual output) in such case, input is called) Independent variable
而输出被称为(while output is called) dependent variable
在只有一个自变量(输入)的情况下,称为(In cases, where we have only one independent variable (input), known as) Simple Linear regression
但是,在更多自变量的情况下,这称为(however in case of more independent variables, this is called) Multiple Linear Regression
.(.)
线性回归参数(Linear regression parameters)
为了理解线性回归,我们必须引用数学表示法,因此对于2D映射(变量之间的关系)和简单的线性回归,让我们将因变量表示为Y,将自变量表示为X.要使Y和X具有线性映射,我们需要找出可以满足该线性要求的直线方程.换句话说,(To understand linear regression we have to refer to mathematical representation, so for 2D mapping (relationship between variables) and simple linear regression, let’s denote dependent variable as Y and independent variable as X. For Y and X to have linear mapping, we need to figure out the equation of straight line that can fulfil this linearity. In other words, the mapping between) Y & X
格式应类似于:(shall be in form similar to:)
Y = a + m * X
哪里:(Where:)
a
是截距点(Y =0的点),称为偏差系数(is the intercept Point (point where Y =0), known as bias coefficient)
m
是线的斜率(Y的变化与X的变化之间的比率),称为系数(is the line slope (ration between change of Y to change of X), known as coefficient)
在Y取决于多个输入的多元线性回归中,假设X1,X2 …^. Xn方程应为(In multiple linear regression where Y is dependent to multiple inputs, let’s say X1, X2,……. Xn equation shall be)
Y = a + m1 * X1 + m2 * X2+ . . . . . . . . + mn * Xn
因此,参数为a和m(m1,m2,….mn)线性算法训练是指为a&m查找提供最佳拟合方程的值(So, parameters are a and m (m1, m2, …. mn) Linear algorithm training is referring to find values for a & m that provide best-fit equation)
可以使用不同的训练算法来解析线性回归参数,最常见的是(Different training algorithms are available to resolve parameters of linear regression, the most common are)
2-简单线性回归(2- Simple Linear Regression)
Y = a + m * X
假设从任何随机参数值开始,然后在数据集中的给定样本i处,我们有与Xi对应的Yi,则可以将该样本的误差定义为Ei,其中(Assume starting with any random parameters values, then at given sample i in data set, where we have Yi corresponding to Xi, we can define error at that sample as Ei where)
Ei = Yi – (estimated output)
Ei = Yi – (a + m * Xi)
平方这个错误是(Squaring this error is) (Ei)^2 = (Yi – (a + m * Xi))^2
通过汇总所有样本中的所有误差(By summing all errors in all samples)
为了找到可以使SE(平方误差)最小的a&m值,对a和m取导数,设置为0,并求解a和m.由于简单和时间考虑,数学公式的解释超出了本文的范围.最后,对a&m的最小二乘估计为(To find values of a & m that can minimize SE (Squared error), take the derivative with respect to a and m, set to 0, and solve for a and m. Explanation of mathematical formulas is beyond scope of this article due to simplicity and time considerations. Finally, the Least Squared estimation for a & m is)
此外,样本方差(11)和样本协方差(12)定义为:(Further, sample Variance (11) and sample Covariance (12) are defined as:)
因此,m方程可以简化为:(So, m equation can be simplified to:)
1- Calculate Average of inputs and Labels (Xavg & Yavg)
2- Calculate sample Covariance
3- Calculate sample Variance
4- Calculate m = Cov/Var
5- Calculate a= Yavg – m * Xavg
逻辑(Logic)
已建立(Created) SimpleRegression
类(Class)
公开课(Public Class) SimpleRegression
''' <summary>
''' 1- Calculate Average of inputs and Labels (Xavg and Yavg)
''' Average = sum/number
''' 2- Calculate sample Covariance
''' 3- Calculate sample Variance
''' 4- Calculate m = Cov/Var
''' 5- Calculate a= Yavg – m * Xavg
''' </summary>
''' <param name="X_Inputs"></param>
''' <param name="Y_Labels"></param>
''' <returns>Line with X as intercept and Y as slope</returns>
Public Shared Function LineBestFit(X_Inputs As Matrix1D, Y_Labels As Matrix1D) As Vector
Dim Line As New Vector
Dim X_avg As Double = 0
Dim Y_Avg As Double = 0
Dim Cov_XY As Double = 0
Dim Var_X As Double = 0
Dim X_Copy As New Matrix1D(X_Inputs)
Dim Y_Copy As New Matrix1D(Y_Labels)
' 1- Calculate Average of inputs and Labels (Xavg & Yavg)
X_avg = X_Inputs.Sum / X_Inputs.Size
Y_Avg = Y_Labels.Sum / Y_Labels.Size
' 2- Calculate sample Covariance
X_Copy = X_Inputs.Sub(X_avg)
Y_Copy = Y_Labels.Sub(Y_Avg)
Cov_XY = X_Copy.Product(X_Copy, Y_Copy).Sum
' 3- Calculate sample Variance
Var_X = X_Copy.SquaredSum
' 4- Calculate m = Cov/Var
Line.y = CSng(Cov_XY / Var_X)
' 5- Calculate a= Yavg – m * Xavg
Line.x = CSng(Y_Avg - Line.y * X_avg)
Return Line
End Function
''' <summary>
''' Return MSE (Mean Square Error) from given Input Data set, Labels and estimated equation
''' </summary>
''' <param name="X_Inputs"></param>
''' <param name="Y_Labels"></param>
''' <param name="Line">X is intercept and Y is slope</param>
''' <returns></returns>
Public Shared Function Validate(X_Inputs As Matrix1D, Y_Labels As Matrix1D, Line As Vector) As Single
Dim Err As New Matrix1D(Y_Labels)
Dim Y_Estimate As New Matrix1D(Y_Labels)
Y_Estimate = X_Inputs.Product(Line.y)
Y_Estimate = Y_Estimate.Add(Line.x)
Err = Y_Labels.Sub(Y_Estimate)
Return Err.SquaredSum / (2 * X_Inputs.Size)
End Function
End Class
类具有功能,(Class has functions,) LineBestFit
和(and) Validate
LineBestFit
提供a&m最佳拟合值(Provides a & m best fit values)
Validate
计算MSE(均方误差)作为算法(最佳拟合)质量的度量(Calculate MSE (Mean Square Error) as a measure of algorithm (best-fit) quality)
测试算法,创建简单表格(To Test Algorithm, created simple form)
简单回归测试表格应用程序(Simple Regression Test Form App)
算法评估(Algorithm Evaluation)
评估算法的准确性(To evaluate accuracy of algorithm) MSE
(均方误差)是计算得出的,但是在某些参考中,可以使用另一个参数,即回归标准误差或残差标准误差((Mean Square Error) is calculated, however in some references, another parameter may be used which is regression standard error or the residual standard error)表示为(denoted as) s
** `` 哪里:(where:)**
3-普通最小二乘法(OLS)(3- Ordinary Least Squares (OLS))
概念非常简单且合乎逻辑,将误差表示为实际标签(正确答案)与估计输出(基于初始线方程)之间的距离,然后,此方法对数据集中所有样本的所有误差求平方并求和以提供(Concept is fairly simple and logical, denoting the error as distance between actual label (correct answer) and estimated output (based on initial line equation) then, this method squares all errors for all samples in data set and sum it to provide) sum squared error
与简单回归相同,但有多个变量(same as simple regression but for multiple variables)
线性代数用于解析此模型(Linear algebra is used to resolve this model)(13)((13))
线性回归中的4梯度下降(4- Gradient Descent In Linear Regression)
梯度下降用于查找函数的局部最小值,伪代码很简单(Gradient descent is used to find a local minimum of a function and Pseudo code is simple)
''' <summary>
''' Linear Regression Pseudo Algorithm to resolve h(x) = a + b * x
''' 1- Start with random values for a and b
''' 2- Iterate through given training set, for each:
''' - Calculate h(x)
''' - Calculate error = h(x) - Y where y is the correct answer or label
''' 3- Sum all errors
''' 4- Calculate MSE (Mean Squared Error) = 1/2*training set size * sum of all errors
''' 5- Get slope of line touching curve at the current point (for each value of a and b)
''' - If +ve slope, move to left or Decrease a or b values
''' - If -ve slope, move to right or increase a or b values
''' 6- Repeat above steps from 2 to 5 till direction or slope of calculated line changes
''' 7- Last values for a and b are the optimal values as per MSE minimization
''' </summary>
第2条已经说明了梯度下降(Gradient descent already explained at article 2)(14)((14))
下降梯度(Descent Gradient)
GD使用上述伪代码将损失函数或成本函数最小化,但是,对于非线性函数而言,并不是那么简单.这是一些GD(GD minimize loss function or cost function using above Pseudo code however, it is not that simple especially for non-linear functions. Here are some of GD)
函数通常具有一个全局最小值,但可能具有多个局部最小值,并且GD可能陷入一个局部最小值中(Function normally has one global minimum but may have multiple local minimums and GD could be trapped in one of local minimums)
梯度下降算法有多种类型,基于(There are many types of gradient descent algorithm, based on)数据摄取(data ingestion):(:)
| Full Batch GD
|一次使用整个数据来计算梯度(Use whole data at once to compute the gradient)
Stochastic GD |
|从数据集中取样以计算梯度(Take a sample from data set to compute the gradient)
|
在区分技术的基础上(On the basis of differentiation techniques)
| First order Differentiation GD
|将一阶微分用于成本函数(Use first order differentiation for cost function)
Second order Differentiation GD |
|将二阶微分用于成本函数(Use second order differentiation for cost function)
|
梯度下降算法的变体(此处是GD方法的很好参考)(Variants of Gradient Descent algorithms (Here is a good reference for GD methods))
- 香草梯度下降(Vanilla Gradient Descent)
- 动量(Momentum)
- Nesterov加速梯度(Nesterov accelerated gradient)
- 阿达格勒(Adagrad)
- 阿达达(Adadelta)
- RMSprop(RMSprop)
- 亚当(Adam)
- 阿达克斯(AdaMax)
- 那达姆(Nadam) 这是一些方法(Here are some of methods)
I.香草梯度下降(I. Vanilla Gradient Descent)
这是梯度下降技术的最简单形式.香草是指纯净的形式,没有任何掺杂.它的主要特征是,通过采用成本函数的梯度,我们朝着最小值的方向迈出了小步.(This is the simplest form of gradient descent technique. Vanilla means pure form without any adulteration. Its main feature is that we take small steps in the direction of the minima by taking gradient of the cost function.)
Parameter_Update = learning_rate * gradient_of_parameters
Parameter_New_Value = Parameter_Old_Value - Update
Learning rate
是一个非常重要的参数,选择其值时应谨慎对待.(is a very important parameter and should be treated with care when choosing its value.)
二. GD与动量(II. GD with Momentum)
在这里,我们对上述算法进行了调整,以使我们在进行下一步之前先注意上一步(Here, we tweak the above algorithm in such a way that we pay heed to the prior step before taking the next step)
Parameter_Update = learning_rate * gradient_of_parameters
Velocity = Previous_update * momentum
Parameter_New_Value = Parameter_New_Value + Velocity – Parameter_Update
在此,更新与香草梯度下降相同.但是,有了一个新名词(Here, update is the same as that of vanilla gradient descent. However, with a new term called) velocity
,它考虑了先前的更新和一个称为的常量(, which considers the previous update and a constant that, is called) momentum
.(.)
三,自适应次梯度(AdaGrad)(III. Adaptive Subgradient (AdaGrad))
ADAGRAD使用自适应技术来更改学习率.对于不同的功能,它允许使用不同的步长.通常与随机GD一起使用(ADAGRAD uses adaptive technique to change learning rate. It allows different step sizes for different features. Normally used with Stochastic GD)
Grad_component = Previous_grad_component + (gradient * gradient)
Rate_change = Square_root(grad_component) + epsilon
Adapted_learning_rate = learning_rate * rate_change
Parameter_update = Adapted_learning_rate * gradient
Parameter_New_Value = Parameter_Old_Value – Parameter_update
在上面的代码中,(In the above code,) epsilon
是一个常数,用于控制学习率的变化率.(is a constant, which is used to keep rate of change of learning rate in check.)
IV.亚当(IV. ADAM)
ADAM是又一种自适应技术,它基于adagrad并进一步减少了它的缺点.换句话说,您可以将其视为动量+ ADAGRAD.(ADAM is one more adaptive technique which builds on adagrad and further reduces it downside. In other words, you can consider this as momentum + ADAGRAD.)
这是一个伪代码.(Here’s a pseudocode.)
Adapted_gradient = Previous_gradient + ((gradient – previous_gradient) * (1 – beta1))
Gradient_component = (gradient_change – previous_learning_rate)
Adapted_learning_rate = previous_learning_rate + (gradient_component * (1 – beta2))
Parameter_Update = adapted_learning_rate * adapted_gradient
Parameter_New_Value = Parameter_Old_Value – Parameter_Update
这里(Here) beta1
和(and) beta2
是用于保持梯度和学习率变化的常数(are constants to keep changes in gradient and learning rate in check)
5-贝叶斯线性回归(5- Bayesian linear regression)
贝叶斯线性回归是一种线性回归方法,其中在贝叶斯推断的上下文中进行统计分析.当回归模型的误差具有正态分布,并且假设采用特定形式的先验分布时,显式结果可用于模型参数的后验概率分布.(Bayesian linear regression is an approach to linear regression in which the statistical analysis is undertaken within the context of Bayesian inference. When the regression model has errors that have a normal distribution, and if a particular form of prior distribution is assumed, explicit results are available for the posterior probability distributions of the model’s parameters.)
6-多项式回归(6- Polynomial Regression)
多项式回归是一种回归分析的形式,其中将自变量x和因变量y之间的关系建模为x中的n次多项式(Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial in x)
n次多项式可以表示为(nth degree polynomial can be expressed as)
多项式回归模型(The polynomial regression model)
可以用线性代数方法(矩阵转换)解决(This can be resolved in linear algebra approach (Matrix conversion))
使用普通最小二乘估计,可以将其解析为(Using Ordinary Least Squares estimation, it can be resolved as)
虽然使用较高的多项式可以得到较小的误差,但是如果没有仔细验证的话,也会导致过度拟合.(While using higher degree polynomial could get lower error, but it can result in overfitting as well if not carefully validated.)
7-逐步回归(7- Stepwise Regression)
当我们处理多个自变量X时,将使用这种形式的回归.在这种技术中,自变量的选择是在自动过程的帮助下完成的,该过程无需人工干预.(This form of regression is used when we deal with multiple independent variables X. In this technique, the selection of independent variables is done with the help of an automatic process, which involves no human intervention.)
这种建模技术的目的是用最少数量的预测变量来最大化预测能力.它是处理更高维度数据集的方法之一.(The aim of this modeling technique is to maximize the prediction power with minimum number of predictor variables. It is one of the method to handle higher dimensionality of data set.)
有关此算法的更多信息(17)和随附的pdf文档(For further info about this algorithm (17) and attached pdf document)
8-正则化(8- Regularization)
正则化是一种最小化平方误差之和并降低模型的复杂度以覆盖模型过度拟合的技术.为了解释正则化,让我们探索术语(Regularization is a technique that minimize sum of squared error and also reduce the complexity of the model to override model overfitting. To explain regularization, let us explore the terms) Overfitting
和(and) Underfitting
过度拟合(Overfitting)
这是模型描述随机误差或噪声而不是基本关系的时候.通常,当模型非常复杂(例如,相对于观察次数的参数过多)时,会出现这种情况.(This is when model describes random error or noise instead of the underlying relationship. Normally this when model is very complex such as having too many parameters relative to the number of observations.)
过度拟合的模型的预测性能较差,因为它对训练数据中的微小波动反应过度.(A model that has been overfitted has poor predictive performance, as it overreacts to minor fluctuations in the training data.)
绿线代表过拟合模型,黑线代表正则化模型.尽管绿线最适合训练数据,但绿线过于依赖训练数据,与黑线相比,新出现的看不见的数据可能有更高的错误率(16)(The green line represents an overfitted model and the black line represents a regularised model. While the green line best follows the training data, it is too dependent on it and it is likely to have a higher error rate on new unseen data, compared to the black line (16))
不合身(Underfitting)
当模型或机器学习算法无法捕获数据的潜在趋势时,就会发生这种情况.例如,当将线性模型拟合到非线性数据时,会发生拟合不足.这样的模型的预测性能会很差.(It occurs when model or machine learning algorithm cannot capture the underlying trend of the data. Underfitting would occur, for example, when fitting a linear model to non-linear data. Such a model would have poor predictive performance.)
线性回归的正则化程序的两个常见示例是:(Two popular examples of regularization procedures for linear regression are:)
Lasso Regression:
修改普通最小二乘以最小化系数的绝对和(称为L1正则化).(where Ordinary Least Squares is modified to also minimize the absolute sum of the coefficients (called L1 regularization).)Ridge Regression:
其中对普通最小二乘进行了修改,以使系数的平方和绝对值最小化(称为L2正则化).(where Ordinary Least Squares is modified to also minimize the squared absolute sum of the coefficients (called L2 regularization).) 当输入值存在共线性并且普通最小二乘法会过度拟合训练数据时,这些方法将非常有效.(These methods are effective to use when there is collinearity in your input values and ordinary least squares would overfit the training data.)
9-岭回归(9- Ridge Regression)
岭回归是一种当数据遭受多重共线性(独立变量高度相关)时使用的技术.在多重共线性中,即使最小二乘估计(OLS)是无偏的;它们的方差很大,使观测值偏离了真实值.通过在回归估计中增加一定程度的偏差,岭回归可以减少标准误差.(Ridge Regression is a technique used when the data suffers from multicollinearity (independent variables are highly correlated). In multicollinearity, even though the least squares estimates (OLS) are unbiased; their variances are large which deviates the observed value far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors.)
对于平方误差之和(For Sum of squared error) SSE
易是标签或正确答案(Yi is label or correct answer)
F(x)的预测值或猜测值=Y(F(x) is predicted or guessed value = Y)
岭回归执行"(Ridge regression performs ‘)L2正则化(L2 regularization)',即,它在优化目标中添加了系数平方和的一个因数.因此,需要在方程式中引入一个附加项:(‘, i.e. it adds a factor of sum of squares of coefficients in the optimization objective. Hence, need to introduce an additional term to the equation:)
Objective = Residual_Sum_of_Squares + lambda * (sum_of_square_of_coefficients)
这里(Here) lambda
是在模型外部确定的收缩因子(它是一个调整参数),它为我们希望对平方系数之和进行惩罚的权重分配权重(is a shrink factor that is determined outside of the model (it is a tuning parameter) and it assigns a weight to how much we wish to penalize the sum of squared coefficients)
在这里,lambda是参数,它平衡了最小化RSS与最小化系数平方和的重点. lambda可以采用各种值:(Here, lambda is the parameter, which balances the amount of emphasis given to minimizing RSS vs minimizing sum of square of coefficients. lambda can take various values:)
lambda =0:(lambda = 0:)
目标与简单线性回归相同.(The objective becomes same as simple linear regression.)
lambda =∞:(lambda = ∞:)
系数将为零.为什么?由于系数平方的权重无限大,因此任何小于零的值都会使目标无限大.(The coefficients will be zero. Why? Because of infinite weightage on square of coefficients, anything less than zero will make the objective infinite.)
0 <lambda <∞:(0 < lambda < ∞:)
λ的大小将决定物镜不同部分的权重.(The magnitude of lambda will decide the weightage given to different parts of objective.)
对于简单的线性回归,系数将在0到1之间.(The coefficients will be somewhere between 0 and ones for simple linear regression.)
任何非零值的值都将小于简单线性回归的值.(Any non-zero value would give values less than that of simple linear regression.)
随着alpha值的增加,模型复杂度降低(As the value of alpha increases, the model complexity reduces)
10-套索回归(10- Lasso Regression)
LASSO
代表最小绝对收缩和选择运算符(stands for Least Absolute Shrinkage and Selection Operator)
套索回归执行(Lasso regression performs)L1正则化(L1 regularization),即在优化目标中添加系数绝对值之和的一个因数.因此,套索回归优化了以下内容:(, i.e. it adds a factor of sum of absolute value of coefficients in the optimization objective. Thus, lasso regression optimizes the following:)
Objective = SSE + lambda * (sum_of_absolute_value_of_coefficients)
这里,(Here,) lambda
其工作原理类似于山脊,并在平衡SSE和系数幅度之间进行权衡.像ridge一样,lambda可以采用各种值:(works similar to that of ridge and provides a trade-off between balancing SSE and magnitude of coefficients. Like that of ridge, lambda can take various values:)
lambda =0(lambda = 0):与简单线性回归相同的系数(: Same coefficients as simple linear regression)
拉姆达(lambda)=∞(= ∞):所有系数为零(逻辑与以前相同)(: All coefficients zero (same logic as before))
0 <(0 <)拉姆达(lambda)<∞(< ∞):系数介于0和简单线性回归的系数之间(: coefficients between 0 and that of simple linear regression)
11- Logistic回归(11- Logistic Regression)
这种机器学习技术通常用于二进制(This machine learning technique is commonly used for binary)分类(classification)问题,即那些受到两个或多个解释变量影响的两种可能结果(是或否)的问题.该算法在给定一组观察变量的情况下估计结果的可能性.(problems, meaning those in which there are two possible outcomes (Yes or No) that are influenced by one or more explanatory variables. The algorithm estimates the probability of an outcome given a set of observed variables.)
Logistic回归是称为广义线性模型(GLM)的一大类算法的一部分.(Logistic Regression is part of a larger class of algorithms known as Generalized Linear Model (GLM).)
物流回归(2D)(Logistics Regression (2D))
物流回归(3D)(Logistics Regression (3D))
逻辑回归可以简单地理解为找到最合适的Beta参数:(The logistic regression can be understood simply as finding the Beta parameters that best fit:)
使用的假设函数为逻辑形式(Hypothesis function used is in logistic form)
在这种情况下,t为(In such case t is)
在基质形成中(In matrix formation)
因此,h(x)可以写成(Hence, h(x) can be written as)
GD可用于解析以下值(GD can be used to resolve values for) Beta
(权重)伪代码可以简化为((Weights) Pseudo code can be simplified as)
For Single Classifier (one discrete output):
Initialize Random Weights
Initialize Learning_rate
Iterate from i=1 to m (m is number of samples in data set)
Calculate Sum_Of_Errors
Err(i) = h(x(i)) – Label(i)
Scale Err to Input Err=Err(i) * X(i)
Sum = Sum + Err(i)
End loop
Weight_New_Value = Weight_old_value - Sum * Learning_rate
上面的代码,如果是单个分类器,如果是多个分类器,则为n个类(Above code if for single classifier, incase multi-classifiers, n classes)
For Multi-Classes Classifiers:
Initialize Random Weights
Initialize Learning_rate
Iterate for all classes j = 1 to n (n is number of classes)
Sum = 0
Iterate from<code> </code>i=<code>1 to m</code> (m is number of samples in data set)
Calculate Sum_Of_Errors
Err(i) = h(x(i)) – Label(i)
Scale Err to Input Err=Err(i) * X(i)(j)
Sum = Sum + Err(i)
End Loop
Weight_New_Value(j) = Weight_old_value(j) - Sum * Learning_rate
End loop
12- K-均值聚类(12- K-means Clustering)
该算法有兴趣解决(This algorithm is interested to resolve) clustering
这类问题,您将拥有输入数据集((kind of problems, where you would have data set of inputs () features
),但没有() but with no) labels
(正确答案).因此,这是无监督学习算法.((correct answers). Therefore, this is Unsupervised learning algorithm.)
K-Means聚类获得n个特征并将其分类为(K-Means Clustering received n features and classify into) k
类.但是,如何确定输入(功能)是属于A类还是B类呢?该算法使用(classes. But How to determine if an input (feature) belongs to class A or B? This algorithm uses) Euclidean Distance
(18)((18))作为分类器.简而言之,如果点更接近A类,则它属于A或其他.(as a classifier. Simply, if point is nearer to Class A mean then it belongs to A or otherwise.)
这是具有2D(2个要素)和3D(3个要素)的K-Means聚类(Here is K-Means Clustering with 2D (2 features) and 3D (3 features))
具有2D的K均值聚类(2个功能)(K-Means Clustering with 2D (2 features))
具有3D的K均值聚类(3个功能)(K-Means Clustering with 3D (3 features))
K-均值聚类伪代码(K-Means Clustering Pseudo code)
Get Data set of n features and k classes
Initialize random k means (known as centroids) m1,m2, …… , mk
Loop until there is no more moves Euclidean distance and classify all points
Based on random means, calculate
For i = 1 to k
Replace mi with the mean of all of the samples for cluster i
End For
End Loop
13-K最近邻居–分类(13- K Nearest Neighbors – Classification)
K最近邻居是一种简单的算法,可存储所有可用案例并根据相似性度量(距离函数)对新案例进行分类. K是要考虑的邻居数.(K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (distance functions). K is number of neighbors to consider.)
算法(Algorithm)
将一个点按其邻居的多数投票(k)进行分类,并根据距离函数将其分配给最接近的K个邻居中最常见的类别.如果K =1,则将该点简单分配给其最近邻居的类别.(A point is classified by a majority vote of its neighbors (k), with the point being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the point is simply assigned to the class of its nearest neighbor.)
兴趣点(Points Of interest)
本文是常见机器学习算法的摘要,其中既有大量算法,也不是完整或详尽的详细摘要.仅关注常见类型.(This article is a summary of common ML Algorithms, it is neither full or extensive detailed summary where ther are tons of algorithms. Just focused on common types.)
可能会有本文的第二部分.(There could be a second part of this article.)
本文顶部附有本文的pdf版本(Attached at top of this article is a pdf version of this article)
参考文献(References)
(20)((20)) https://www.codeproject.com/Articles/1205296/Build-Simple-AI-NET-Library-Part-Machine-Learnin(https://www.codeproject.com/Articles/1205296/Build-Simple-AI-NET-Library-Part-Machine-Learnin) (21)((21)) https://zh.wikipedia.org/wiki/分层体系(https://en.wikipedia.org/wiki/Hierarchical_clustering) (22)((22)) https://zh.wikipedia.org/wiki/K-means_clustering(https://en.wikipedia.org/wiki/K-means_clustering)
许可
本文以及所有相关的源代码和文件均已获得The Code Project Open License (CPOL)的许可。
VB.NET C# .NET machine-learning ML AI 新闻 翻译