[译]贝塞尔曲线机器学习示范
By robot-v1.0
本文链接 https://www.kyfws.com/ai/bezier-curve-machine-learning-demonstration-zh/
版权声明 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
- 20 分钟阅读 - 9711 个词 阅读量 0贝塞尔曲线机器学习示范(译文)
原文地址:https://www.codeproject.com/Articles/1256883/Bezier-Curve-Machine-Learning-Demonstration
原文作者:asiwel
译文由本站 robot-v1.0 翻译
前言
Bezier Curve Classification Training And Validation Models Using ALGLIB
使用ALGLIB的贝塞尔曲线分类训练和验证模型 访问第1部分:(Visit Part 1:) 数据可视化和贝塞尔曲线(Data-Visualizations-And-Bezier-Curves)
下载BezierCurveMachineLearning.zip(Download BezierCurveMachineLearning.zip)
介绍(Introduction)
我喜欢使用纵向数据.这是有关使用Bezier曲线平滑大数据点波动并提高图案展开可见性的系列文章中的第二篇.本文主要关注使用机器学习算法训练网络模型,以识别学生学习成绩的贝塞尔曲线轨迹中反映的模式和趋势.(I enjoy working with longitudinal data. This is the second article in a series about using Bezier curves to smooth large data point fluctuations and improve the visibility of the patterns unfolding. This current article focuses on using machine learning algorithms to train network models to recognize patterns and trends reflected in Bezier curve trajectories of student academic performance.)
背景(Background)
该演示以ALGLIB为特色,ALGLIB是C#程序员使用的更好的数值分析库之一,它提供了几种易于使用的机器学习方法. (在本系列文章的后面,我们将检查MS CNKT的C#例程.)适用于C#的ALGLIB可以作为免费的单线程版本进行适当的许可,以用于单独的实验和使用,也可以作为商业的多线程版本进行购买.对于此演示,我将详细解释如何下载和构建免费版本作为类库,该类库需要包含在内,以作为演示程序正常工作的参考.作为参考,我向读者介绍了(This demonstration features ALGLIB, one of the better available numerical analysis libraries for C# programmers that offers several easy-to-use machine learning methods. (Later in this series, we will examine MS CNKT for C# routines.) ALGLIB for C# is available and licensed appropriately as a free, single-threaded edition for individual experimentation and use or as a commercial, multi-threaded edition for purchase. For this demonstration, I will explain at some length how to download and build the free edition as a class library that will need to be included as a reference for the demonstration program to work. For reference, I refer the reader to the) ALGLIB维基百科页面(ALGLIB Wikipedia page) 对于描述和历史,(for a description and history, the) ALGLIB网站(ALGLIB website) 下载信息和出色的在线信息(for download information and an excellent on-line)用户指导(User’s Guide),并下载自己以获取详细信息(, and to the download itself for a detailed)用户手册(User’s Manual)在(in)**.html(.html)**格式.(format.)
与ALGLIB合作(Working with ALGLIB)
要做的第一件事是访问ALGLIB网站,查看新闻并(The first thing to do is visit the ALGLIB website, examine the news and)用户指导(User’s Guide)(尤其是有关"数据分析:分类,回归,其他任务"的章节),并下载用于C#的ALGLIB 3.14.0的免费版本(2018年6月16日发行)为ZIP文件.解压缩并打开应命名的应用程序文件夹((particularly the chapter about “Data analysis: classification, regression, other tasks”), and download the free edition of ALGLIB 3.14.0 for C# (released 6/16/2018) as a ZIP file. Unzip and open the application folder which should be entitled)**“尖锐."(“csharp.”)该文件夹的内容列表中有一个名为(Among the list of contents of this folder is a file entitled)” manual.csharp.html."(“manual.csharp.html.”)**它包含详细的叙述,方法描述和示例代码片段,这些内容对于希望使用此库的任何人都非常有用.感兴趣的第二项(This contains detailed narrative, method descriptions, and example code snippets useful to anyone who wishes to utilize this library. A second item of interest in the)**尖锐的(csharp)文件夹是一个名为的子文件夹(folder is a subfolder named)“网络核心”(“net-core”)**其中包含一个(which contains an)**alglibnet2.dll(alglibnet2.dll)库和另一个名为的子文件夹(library and another subfolder named)" src"(“src”)**包含源代码.可以提取ALGLIB的各个部分以直接包含到程序中(以节省空间等).我更喜欢从源代码构建自己的完整库.为此,请按照以下演练进行操作:(containing source codes. Individual parts of ALGLIB can be extracted for direct inclusion into programs (to save space, etc.). I prefer building my own complete library from the sources. To do that, follow this walk-through:)
- 打开(Open)**视觉工作室(Visual Studio)**并打开一个新的(and open a new)**类库(.Net Framework)C#(Class Library (.Net Framework) C#)**项目.(project.)
- 命名项目(Name the project)ALGLIB314(ALGLIB314);选择一个保存项目文件夹的位置;将框架设置为(; select a location for saving the project folder; set the framework to be).NET4.7(.NET4.7);点击(; click)**创建解决方案目录(Create Directory for Solution)**复选框然后单击确定按钮.(checkbox; and click the OK button.)
- 当项目打开时,在(When the project opens, in the)解决方案资源管理器(Solution Explorer),右键单击(, right-click on the)**项目名称(Project Name)**然后选择(and select the)**添加:现有项目(Add: Existing Item)**选项.(options.)
- 在里面(In the)**添加现有项目(Add Existing Items)**窗口中,找到下载的(window, locate the downloaded)**尖锐的(csharp)**夹;打开它;打开(folder; open it; open the)**网芯(net-core)**子文件夹及其(subfolder and its)**src(src)子文件夹.在那里,选择所有(subfolder. There, select all of the).cs(.cs)**列出的文件,然后单击"添加"以将这些文件的副本带入新的ALGLIB314项目.(files listed and click ADD to bring copies of these into your new ALGLIB314 project.)
- 接下来,在Visual Studio中,选择(Next, in Visual Studio, select)**发布(Release)**和(and)**任何CPU(Any CPU)**然后使用菜单栏选择(and then use the menu bar to choose the)**构建:构建解决方案(Build: Build Solution)**选项来构建完整的ALGLIB314.dll库.(option to build a complete ALGLIB314.dll library.)
- 注意:上一步将失败(NOTE: the previous step will fail),在输出中产生两个错误(, yielding two errors in the output)**错误清单(ERROR LIST)**窗格引用项目的第11行和第12行(pane referring to lines 11 and 12 in the project’s)**AssemblyInfo.cs(AssemblyInfo.cs)**文件.要更正此问题,请单击错误行之一以打开该文件,然后只需注释掉这两行并保存该文件即可. (这是因为已经在(file. To correct this, click on one of the error lines to open that file and simply comment out those two lines and save that file. (This is because the very same information is provided already in the)**alglib_info.cs(alglib_info.cs)**文件).(file).)
- 现在,点击(Now, click the)**构建:重建解决方案(Build: Rebuild Solution)**再次单击菜单选项以创建ALGIB314.dll文件的发行版本.这次过程将成功.(menu option again to create a release version of the ALGIB314.dll file. This time the process will succeed.)
- 终于改变(Finally change)**发布(Release)**至(to)**除错(Debug)**然后点击(and click)**建立解决方案(Build Solution)**再次.(again.)
- 现在,您拥有完整版的发行版和调试版(Now you have both Release and Debug versions of the complete)**ALGLIB314.dll(ALGLIB314.dll)库(及其关联的(library (and its associated).pdb(.pdb)**文件以进行调试)在项目的BIN文件夹中.以后可以将该库添加到其他项目中作为参考.(file for debugging) in your project’s BIN folder. This library can be added later to other projects as a reference.)
- 保存您的类库项目并关闭解决方案.(Save your class library project and close the solution.)
- 这也将是复制(This would also be a good time to copy the)**manual.csharp.html(manual.csharp.html)**文件在(file in the)**尖锐的(csharp)**文件夹,然后将该副本粘贴到新的ALGLIB314项目文件夹的顶层,以备将来参考.(folder and paste that copy into the top level of your new ALGLIB314 project folder for future reference.)
数据特征(Data Characteristics)
该数据可视化系列的上一篇文章标题为(The previous article in this data visualization series, entitled) 数据可视化和贝塞尔曲线(Data Visualizations And Bezier Curves) ,是关于使用贝塞尔曲线建模数据.我们研究了曲线拟合,时域点评估和绘图以及微分.您可能希望阅读该文章.再次强调,这里我们要讨论的是从起点(沿时间轴或X轴)到终点而没有循环,尖峰或回溯的纵向数据.再次作为示例,我们将使用6年级至12年级学生在一段时间内的学习成绩.在本文中,我们将讨论构建机器学习分类模型并使用它们来识别各种模式,例如识别出现的学生的轨迹表现良好或可能处于危险之中.(, was about modeling data with Bezier curves. We looked at curve-fitting, time-domain point evaluation and plotting, and differentiation. You may wish to review that article. Again, it should emphasized that here we are discussing longitudinal data that moves from a starting point (along a Time or X axis) to an ending point, without loops, cusps, or backtracks. Once again, as examples, we will use student academic performance over time from grades 6 through 12. In this current article, we will discuss building machine learning classification models and using those to recognize various patterns, such as trajectories that identify students who are appear to be doing well or likely to be at-risk.)
在这里,我们将在核心课程中使用另一个小样本(N =500)的学校成绩期平均成绩(MPgpa)学生成绩历史记录. MPgpa的范围在0.00到4.00之间.这是一个模拟的样本,它是从大型多州,多学区的MonteCarlo版本中提取的,这些数据经过完全去识别后的研究数据库. (有关数据源的完整描述,请参见上一篇文章.)在该研究中,学生历史的样本要大得多(从中学入学到八年级末).(Here, we will use another small sample (N=500) of school marking period grade point average (MPgpa) student performance histories in core coursework. MPgpas are bounded from 0.00 to 4.00 on the grading scale. This is a simulated sample drawn from a MonteCarlo version of a large multi-state, multi-school district, thoroughly de-identified research database. (See the previous article for a fuller description of the data source.) In that research, a much larger sample of student histories (from the point of entry into middle school through the end of the 8)日(th)等级)分别由Bezier曲线使用课程时间表进行建模,检查和分类,以形成大型数据集进行机器学习.(grade) were individually modeled by Bezier curves using a curriculum timeline, examined, and classified to form a large data set for machine learning.)
每条曲线都被分类为(1),表明在课程时间表上直至状态估计点(中学毕业9.0)为止相对成功的学术历史; (2)由于MPgpa模式下降到评分等级上的2.0以下而可能带来的风险; (3)仍处于危险之中,但MPgpa模式呈现上升趋势,接近或略高于2.0;或(4)严重面临学业失败的风险,这种趋势趋向于1.0以下.我们的小型演示数据集是通过随机选择预先分类的学生历史记录而建立的,同时确保了代表每个状态组的均衡演示数据集. (这种方式不能"平衡"实际数据.幸运的是,通常在公立学校中,“在学术上取得成功"的学生要比"处于危险中"的学生多好几倍.)(Each curve was classified either as (1) indicating a relatively successful academic history up to the status estimation point (9.0, end of middle school) on the curriculum timeline; (2) as possibly at-risk due to a falling MPgpa pattern below 2.0 on the grading scale; (3) still at-risk but exhibiting a rising MPgpa pattern toward or slightly above 2.0; or (4) seriously at risk of academic failure, a pattern trending below 1.0. Our small demonstration data set was built by selecting pre-classified student histories at random, while insuring a balanced set of demo data representing each status group. (Actual data is not “balanced” this way. Fortunately and typically in public schools, there are several times more students who appear “academically successful” than student who appear “at-risk."))
使用代码(Using the code)
在此使用Visual Studio 2017和.Net 4.7用C#编写的” BezierCurveMachineLearningDemo"项目中,我们首先读取一个数据文件并构造学生历史记录,定义为DataPoint元组列表(时间,MPgpa).然后,将数据记录的集合随机混洗.(In this “BezierCurveMachineLearningDemo” project written in C# using Visual Studio 2017 and .Net 4.7, we first read a data file and construct student histories, defined as lists of DataPoint tuples (time, MPgpa). The collection of data records is then shuffled randomly.)
从每个学生的历史记录中,得出一个子历史记录(从进入中学的时间点到状态估计点9.0,表示8结束)(From each student history, a sub-history (from the time point of entry into middle school through the status estimation point of 9.0 marking end of 8)日(th)年级)被提取.与每个子历史记录相关联的是预分类的状态值.(grade) is extracted. Associated with each sub-history is a pre-classified status value.)
接下来,拟合贝塞尔曲线以对每个学生子历史进行建模.通过从起点到终点(包括端点)以相等的时间间隔提取24个MPgpa值的列表,可以对平滑的Bezier曲线本身进行建模. 24点似乎足以模拟这些贝塞尔曲线的学习成绩,而这些曲线在课程时间范围内很少表现出超过两个或三个频率周期.(Next, a Bezier curve is fitted to model each student sub-history. That smooth Bezier curve itself is modeled by extracting a list of 24 MPgpa values at equal time intervals from the starting point to ending point, inclusive. 24 points appear to be sufficient to model these Bezier curves of academic performance which seldom exhibit more than two or three frequency cycles over the curriculum timeframe.)
ALGLIB机器学习方法要求数据采用要素行的double [,]数组形式,后跟一个标签.描述贝塞尔曲线图案的24个点构成了本演示的功能.预分类的状态值(0\1\2或3)成为标签(因为在ALGLIB中,标签从0开始).然后,将数据分为训练和验证数据集(基于用户定义的分配百分比)以方便访问.(ALGLIB machine learning methods require data to be in the form of a double[,] array of rows of features, followed by a label. The 24 points describing a Bezier curve pattern form the features in this demonstration. The pre-classified status value (0, 1, 2, or 3) becomes the label (because in ALGLIB, labeling starts at 0). The data then is divided into training and validation data sets (based on a user-defined allocation percentage) for ready access.)
下载项目并在Visual Studio中打开解决方案文件.在解决方案资源管理器中,右键单击项目引用文件夹,然后删除对ALGLIB314.dll库的任何现有引用.然后使用(Download the project and open the solution file in Visual Studio. In the Solution Explorer, right-click on the project references folder and delete any existing reference to the ALGLIB314.dll library. Then use the)**添加参考(Add References)**选项以查找ALGLIB314.dll库的发行版本并将其添加到项目中(在上面创建).然后点击(option to locate and add the release version of ALGLIB314.dll library (that you created above) to the project. Then click)**开始(Start)**生成并运行该应用程序.该解决方案需要3个软件包(MSTest.TestAdapter.1.2.1,MSTest.TestFramework.1.2.1和System.ValueTuple.4.3.1),这些软件包应自动下载并还原.(to build and run the application. The solution requires 3 packages (MSTest.TestAdapter.1.2.1, MSTest.TestFramework.1.2.1, and System.ValueTuple.4.3.1) which should download and restore automatically.)
一个WinForm应该打开,并带有一个列表框,一个图表,一个datagridview和一些按钮.列表框将列出机器学习过程的展开情况.图表以可视化形式显示了此展开过程.网络培训完成后,列表框中将提供简短摘要,而datagridview中将显示培训结果的更详细的交叉表.对于此演示,提供了菜单选项以选择要训练的网络模型的类型(神经网络或决策森林),用于训练的数据的百分比以及标签是否为二进制(0\1)或第三(0,1,2,3).表单上的其他控件用于开始训练过程,运行验证测试,保存表单图像,保存经过训练的模型以及关闭应用程序.(A WinForm should open with a listbox, a chart, a datagridview, and a few buttons. The listbox will tabulate the machine learning process as it unfolds. The chart shows this unfolding process as a visualization. When network training completes, a brief summary is provided in the listbox and a more detailed cross-tabulation of training results is presented in the datagridview. For this demonstration, menu options are provided to select the type of network model to be trained (a neural network or a decision forest), the percentage of the data used for training, and whether the labels will be binary (0, 1) or tertiary (0, 1, 2, 3). Other controls on the form are used for starting the training process, running the validation test, saving the form image, saving the trained model, and closing the application.)
兴趣点(Points of Interest)
该应用程序提供了一个带有两个选项的"演示"菜单:一个选择要训练的神经网络作为分类模型.另一个示例说明了出于相同目的训练决策森林模型.选择一个模型,然后单击(The application provides a Demonstration menu with two options: one selects a neural network to be trained as the classification model. The other illustrates training a decision forest model for the same purpose. Select a model and click the)**开始(Start)**按钮运行新的分析.(button to run a new analysis.)
即使不是大多数,大多数代码也不是很明显,用于构建表单和演示用户界面,以及用于创建和操作数据集.但是,最重要的方法是那些调用由ALGLIB实现的实际机器学习过程的方法.两种模型的ALGLIB API都非常相似.例如,这是演示中的NeuralNet代码块,它定义并实现了(24:12:6:4)神经网络,该神经网络已被证明足以对代表学术表现的Bezier曲线进行分类.(Much if not most of the code is unremarkable, for building the form and the demo User Interface and for creating and manipulating the dataset. However, the most important methods are those that call the actual machine learning processes implemented by ALGLIB. The ALGLIB API is very similar for both types of model. For instance, this is the NeuralNet code block in the demo that defines and implements a (24:12:6:4) neural network that has been shown to be adequate for classifying Bezier curves representing academic performance.)
class NeuralNet
{
private static alglib.multilayerperceptron network;
private static alglib.mlptrainer trainer;
…
private static double relclserror;
private static double accuracy;
public static void RunExample1()
{
int Nfeatures = Form1.rdr.NofFeatures;
int Nlabels = Form1.LabelCategories == LABEL.Binary ? 2 : 4;
…
// First, create trainer function
alglib.mlpcreatetrainercls(Nfeatures, Nlabels, out trainer);
alglib.mlpsetdataset(trainer, Form1.rdr.GetTrainingData, Form1.rdr.NofTrainingCases);
// Create neural network with two hidden layers
alglib.mlpcreatec2(Nfeatures, 12, 6, Nlabels, out network);
// Set the regularization
alglib.mlpsetdecay(trainer, 1.0E-3);
// Run the trainer
//RunAutoTrainer();
RunStepTrainer();
TrainingSummary();
PredictStatus(Form1.rdr.GetTrainingData);
}
…
}
如您所见,培训师由功能数量和标签值数量定义,其中每个案例记录中可能会出现其中之一.一组训练记录形成了呈现给训练者的数据集.定义一个网络,在这种情况下,由24个节点的输入层(用于描述Bezier曲线的24个点),两个隐藏层–第一个具有12个节点,第二个具有6个节点,以及一个具有4或2个节点的输出层用于第三级或二进制分类.然后,ALGLIB提供了几种运行模式.其中之一只是在产生输出之前"自动运行"到完成.(As you can see, a trainer is defined by the number of features and number of label values, one of which might appear in each case record. A training set of records forms the dataset presented to the trainer. A network is defined, in this case by a 24 node input layer for the 24 points describing a Bezier curve, two hidden layers – the first with 12 nodes and the second with 6 nodes, and an output layer with 4 or 2 nodes for tertiary or binary classification. ALGLIB then offers several run modes. One of those simply “autoruns” to completion before producing output.)
/// <summary>
/// This neural network trainer runs to completion without producing intermediate results
/// </summary>
private static void RunAutoTrainer()
{
…
alglib.mlptrainnetwork(trainer, network, 1, out alglib.mlpreport rep);
accuracy = 1.0 - rep.relclserror; // set global accuracy variable
ShowReport(rep);
}
还有一个"逐步"运行模式,它会产生中间输出.此演示使用的运行模式来更新表格列表框信息和"培训进度"图表可视化.(There is also a “step-wise” run mode that does produce intermediate output. This is the run mode used by this demonstration to update the tablular listbox information and the Training Progress chart visualization.)
/// <summary>
/// This neural network Trainer produces intermediate step-wise results
/// </summary>
private static void RunStepTrainer()
{
int epoch = 0;
alglib.mlpstarttraining(trainer, network, true);
while (alglib.mlpcontinuetraining(trainer, network))
{
avgce = alglib.mlpavgce(network, Form1.rdr.GetTrainingData, Form1.rdr.NofTrainingCases);
ssqerror = alglib.mlperror(network, Form1.rdr.GetTrainingData, Form1.rdr.NofTrainingCases);
rmserror = alglib.mlprmserror(network, Form1.rdr.GetTrainingData, Form1.rdr.NofTrainingCases);
relclserror = alglib.mlprelclserror(network, Form1.rdr.GetTrainingData, Form1.rdr.NofTrainingCases);
accuracy = 1.0 - relclserror; // set global accuracy variable
Charts.ChartAddaPoint(epoch++, (float)accuracy); // update chart
…
}
}
训练完模型后,只需将其保存为磁盘文件,然后将该模型重新加载到用于分类新数据记录的其他程序中,就很简单.为此,ALGLIB提供了网络序列化和反序列化功能,例如:(Once the model is trained, it is a simple matter to save it as a disk file and to reload that model into a different program for classifying new data records. ALGLIB provides network serialization and deserialization functions for this purpose, e.g.:)
public alglib.multilayerperceptron network;
…
/// <summary>
/// Write the trained serialized neural network to a disk data file
/// </summary>
/// <param name="pathname"></param>
public static void SaveTrainedNeuralNetwork(string pathname)
{
alglib.mlpserialize(network, out string s_out);
System.IO.File.WriteAllText(pathname, s_out);
}
和(and)
/// <summary>
/// Read and deserialize a saved trained serialized neural network
/// </summary>
/// <param name="pathname"></param>
public static void SaveTrainedNeuralNetwork(string networkPathName)
{
if (!File.Exists(networkPathName))
throw new FileNotFoundException("Neural Network Classifier file not found.");
string text = System.IO.File.ReadAllText(networkPathName);
alglib.mlpunserialize(text, out network);
}
但是,在演示中,我们已经在内存中存储了经过训练的网络,可以将其首先应用于训练集,然后再应用于验证集,如下所示:(In the demonstration, however, we already have the trained network in memory and can apply it first to the training set and later to the validation set, like so:)
/// <summary>
/// Use an in-memory training or validation data array and predict status with trained neural network
/// </summary>
/// <param name="rundata"> a training or validation data array</param>
/// <param name="validationFlag"> True: add Validation summary info to listbox</param>
/// <remarks>This routine concludes by producing the Crosstabulation DataGridView</remarks>
public static void PredictStatus(double[,] rundata, bool validationFlag = false)
{
double[] results = new double[4];
List<int> status = new List<int>();
List<int> predict = new List<int>();
int NofMatches = 0;
for (int i = 0; i < rundata.GetLength(0); i++)
{
alglib.mlpprocess(network,
Support.VectorRow(ref rundata, i, 1), ref results); // get probabilities
int pred = Array.IndexOf(results, results.Max()) + 1; // get max prob index + 1
int actual = Convert.ToInt32(
Math.Round(rundata[i, rundata.GetLength(1) - 1], 0)) + 1; // get actual status label
status.Add(actual);
predict.Add(pred);
if (actual == pred) NofMatches++;
}
if (validationFlag == true)
{
accuracy = (double)NofMatches / rundata.GetLength(0);
ValidationSummary();
}
Support.RunCrosstabs(predict, status);
}
的(The)**crosstabs.cs(crosstabs.cs)**该模块包含在WinForm的datagridview控件中以"混淆"矩阵格式显示实际和预测分类结果的方法,以及各种精度,召回率和准确性统计信息.有兴趣将例程用于自己目的的读者可以参考CodeProject的另一篇文章(module contains methods to display actual and predicted classification results in a “confusion” matrix format in the datagridview control in the WinForm, along with various precision, recall, and accuracy statistics. Readers interested in using that routine for their own purposes are referred to another CodeProject article entitled) 人工智能分类项目的交叉表/混淆矩阵(Crosstabs/Confusion Matrix for AI Classification Projects) 我前一段时间写的(that I wrote some time ago.)
用ALGLIB训练和使用决策森林的过程与上面的编码非常相似,并且演示包括实现该模型的DecisionForest类模块. (通过这个小的训练集,DecisionForest模型使用24个功能,2或4个标签,10棵树以及60%的r值进行节点拆分.)该演示还具有菜单选项来更改为数据分配的数据百分比从50%到100%训练和验证模型.另一个菜单选项将预分类的状态标签变量重新编码为二进制分类,而不是三次分类(即(0)明显成功或(1)可能有风险).操纵这些选项具有预期的效果.更多的培训案例和更少的标签值可以带来更好的结果.在先前的研究中,使用更大的数据集(N> 14,000)和60%/40%进行训练/验证,使用这些分类模型通常可以实现Bezier曲线的稳定验证精度> .98.(Training and using a decision forest with ALGLIB is a process very similar to the coding above and the demonstration includes a DecisionForest class module that implements that model. (With this small training set, the DecisionForest model uses 24 features, 2 or 4 labels, 10 trees, and an r-value of 60% for node-splitting.) The demo also has menu options to change the percentage of data allocated for training and validating the models from 50% up to 100%. Another menu option recodes the pre-classified status Label variable for binary, rather than tertiary, classification (i.e., either (0) apparent success or (1) possibly at-risk). Manipulating these options has the expected effects. More training cases and fewer label values lead to better results. In previous research, using a much larger data set (N>14,000) and 60%/40% for training/validation, stable validation accuracies >.98 for Bezier curves have typically been achieved using these classification models.)
结论(Conclusion)
从该系列的第一个演示得出的主要结论是,贝塞尔曲线对于在不同时间或X轴点收集的嘈杂的纵向数据可能是非常有用的模型.第二个推论是,如果模型是好的,那么关于此类模型的推论也可能是好的.该演示通过他们各自的公立学校课程说明了许多不同的学生表现轨迹.提出了一个问题,即这些指标是相对成功的指标,是处于终点还是处于"当前"或中间兴趣点,例如从8到9.0的过渡点(The principal conclusion drawn from the first demonstration in this series was that Bezier curves can be very useful models for noisy longitudinal data collected at varying time or X-axis points. A second inference was that if models are good, then inferences about such models may be good as well. That demonstration illustrated many different student performance trajectories through their respective public school curricula. A question was raised whether these are indicators of relative success, at the end points or at “current” or intermediate points of interest, such as the transition point at 9.0 from 8)日(th)九年级(grade middle school into 9)日(th)年级和高中.(grade and high school.)
通过查看实际数据或Bezier曲线平滑数据,教育工作者可以决定特定学生是否在学术上可能处于"危险中”(因此可以从其他支持服务中受益).这就是为该演示构建预分类数据集的方式.但是,假设您需要根据每个人的学习进度,实时地(例如,在每个评分期间)对100或1000个学生做出类似的决定?学校辅导员每天都会做这种事情,大多数情况是没有此类数据的使用或使用.可以以各种方式对性能轨迹进行分类的机器学习模型无疑将成为数据驱动决策的有益工具.(By looking at the actual or the Bezier curve smoothed data, educators can make decisions about whether or not particular students may be academically “at-risk” (and thus may benefit from additional support services). This is how a pre-classified data set was constructed for this demonstration. However, suppose you need to make similar decisions about 100’s or 1000’s of students in real time (e.g., at each marking period) as each individual progresses through his or her curriculum? School counselors do that sort of thing every day, mostly without the availability or the use of such data. A machine learning model that could classify performance trajectories in various ways would surely be a beneficial tool for data-driven decision-making.)
本文是一个示例,说明了如何使用机器学习算法来训练分类网络模型,以识别反映学生学习成绩趋势的贝塞尔曲线轨迹模式.重点是ALGLIB提供的数值方法,但是对于C#使用MS CNTK或对于Python使用TensorFLow或KERAS可以完成相同的操作.我认为这种跨模型验证/确认始终是重要且有用的步骤.本系列的下一篇文章将对此进行探讨.(This article is an example of how to use machine learning algorithms to train classification network models to identify Bezier curve trajectory patterns reflecting student academic performance trends. The focus was on numerical methods provided by ALGLIB, but the same thing can be done using MS CNTK for C# or using TensorFLow or KERAS for Python. I think that kind of cross-model verification/validation is always an important and useful step. That next article in this series will look at that.)
学校和学区使用商业学生信息系统(SIS)不断收集数据.但是,这种"原始"信息很少可以访问.一旦掌握了经过训练的模型,仪表板和其他可视化方法就可以访问该数据,对其进行分析,并向教育工作者提供决策工具,以实时进行单个或汇总队列分析.这些也许是本系列将来文章的主题.(Schools and school districts constantly gather data using commercial student information systems (SIS). But this “raw” information is seldom accessible. Once a trained model is in hand, dashboards and other visualization methods can access that data, analyze it, and present educators with decision-making tools for individual or aggregated cohort analyses in real time. These perhaps are topics for future articles in this series.)
除此之外,该项目还提供了多种有用的技术和方法,用于处理Bezier曲线以及训练机器学习模型,以识别可被其他类似项目采用的此类曲线所反映的模式和趋势.(Beyond that, this project presents a variety of useful techniques and methods for working with Bezier curves and for training machine learning models to recognize patterns and trends reflected by such curves that can be adapted for other similar projects.)
历史(History)
22(22)日(th)2018年8月:1.0版(August, 2018: Version 1.0)
许可
本文以及所有相关的源代码和文件均已获得The Code Project Open License (CPOL)的许可。
XML C# .NET VS2013 JSON text machine-learning AI 新闻 翻译