[译]pyDAAL简介
By robot-v1.0
本文链接 https://www.kyfws.com/ai/introduction-to-pydaal-zh/
版权声明 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
- 3 分钟阅读 - 1330 个词 阅读量 0pyDAAL简介(译文)
原文地址:https://www.codeproject.com/Articles/1190111/Introduction-to-pyDAAL
原文作者:Intel Corporation
译文由本站 robot-v1.0 翻译
前言
*This paper shows how the python API of the Intel® Data Analytics Acceleration Library (Intel® DAAL) tool works. First, we explain how to manipulate data using the pyDAAL programming interface and then show how to integrate it with python data manipulation/math APIs. *
本文展示了英特尔®数据分析加速库(Intel®DAAL)工具的python API如何工作.首先,我们说明如何使用pyDAAL编程接口来处理数据,然后说明如何将其与python数据处理/数学API集成. 最后,我们演示了如何使用pyDAAL来为预测问题实现简单的线性回归解决方案.(Finally, we demonstrate how to use pyDAAL to implement a simple Linear Regression solution for a prediction problem.)
数据科学是一个新兴的新领域,汇集了许多其他领域的概念,例如:数据挖掘,数据分析,数据建模,数据预测,数据可视化等.尽快执行此类任务的需求已成为当今数据解决方案中的主要问题.考虑到这一点,英特尔DAAL是一个高度优化的库,其目标是为针对当今高度并行系统(例如英特尔®至强融核™处理器)的数据分析提供完整的解决方案.(Data Science is a new recent field that put together lots of concepts of other areas such as: Data mining, Data Analysis, Data modeling, Data Prediction, Data Visualization and so on. The need for performing such tasks as quickly as possible has become the main issue in today’s data solutions. With that in mind, the Intel DAAL, is a highly optimized library whose goal is to provide a full solution for data analytics targeting today’s highly parallel systems such as Intel® Xeon Phi™ processors.)
英特尔DAAL为数据分析流程的许多步骤提供解决方案,例如预处理,数据转换,降维,数据建模,预测以及多种驱动程序,用于以大多数常见数据格式进行读写.库中所有功能的摘要如图1所示.(Intel DAAL delivers solutions for many steps of a data analytics pipeline, such as pre-processing, data transformations, dimensionality reduction, data modeling, prediction, and several drivers for reading and writing in most of the common data formats. A summary of all features inside the library can be seen in Figure 1.)
图1.英特尔®数据分析加速库提供的主要算法(Figure 1. Main algorithms delivered by Intel® Data Analytics Acceleration Library)从图1中可以看出,所有API均与C ++,Java 和Python (2017版beta版提供的最新功能)兼容.该工具内部实现的许多算法都可以在3种主要模式下执行:(As can be seen in Figure 1, all APIs are compatible with C++, Java, and Python (a recent addition available from version 2017 beta). Many of the algorithms implemented inside the tool can be executed in 3 main modes:)
- 批量(Batch):在这种模式下,处理是以串行方式进行的,例如,训练算法是在单个节点中顺序执行的;(: in this mode, the processing occurs in a serial way, e.g., the training algorithm is executed in a single node sequentially;)
- 分散式(Distributed):顾名思义,在这种处理模式下,必须将数据集拆分并分布在计算节点之间.然后,算法计算部分解,并在最后一步统一这些解;和(: as the name suggests, in this processing mode, the dataset must be split and distributed among the computing nodes. The algorithm then calculate partial solutions and, at the last step, unifies such solutions; and)
- 线上(Online):在此处理模式下,数据被视为连续流.通过建立增量模型进行处理,最后从局部模型构建完整模型.(: in this processing mode, the data is considered as being a continuous stream. The processing occurs by building incremental models, and, at the end, building a full model from the partial models.) 本文涵盖了更多处理模式,以及有关数据管理的更多详细信息以及如何使用pyDAAL来为预测问题实现简单的线性回归解决方案(More on the processing modes, together with additional details on Data Management and how to use pyDAAL to implement a simple Linear Regression solution for a prediction problem are covered in this) 白皮书(whitepaper) .(.)
资料来源:(Source available on) 的GitHub(GitHub)
许可
本文以及所有相关的源代码和文件均已获得The Code Project Open License (CPOL)的许可。
Java Python C++ machine-learning 新闻 翻译