H2o 简明教程

H2O - Introduction

您是否曾被要求在一个庞大的数据库上开发一个机器学习模型?通常情况下,客户会为您提供数据库,并要求您做出某些预测,例如谁将是潜在买家;是否可以早期检测到欺诈案,等。为了回答这些问题,您的任务将是开发一个机器学习算法来为客户的查询提供答案。从头开始开发机器学习算法并非易事,而且当市场上有几个可直接使用的机器学习库时,为何要这样做。

Have you ever been asked to develop a Machine Learning model on a huge database? Typically, the customer will provide you the database and ask you to make certain predictions such as who will be the potential buyers; if there can be an early detection of fraudulent cases, etc. To answer these questions, your task would be to develop a Machine Learning algorithm that would provide an answer to the customer’s query. Developing a Machine Learning algorithm from scratch is not an easy task and why should you do this when there are several ready-to-use Machine Learning libraries available in the market.

如今,您宁愿使用这些库,从这些库中应用一个经过充分测试的算法,然后再查看其性能。如果性能未达到可接受的限度,您将尝试微调当前算法或尝试一个完全不同的算法。

These days, you would rather use these libraries, apply a well-tested algorithm from these libraries and look at its performance. If the performance were not within acceptable limits, you would try to either fine-tune the current algorithm or try an altogether different one.

同样,您可以在同一数据集上尝试多个算法,然后选取最能令人满意地满足客户需求的算法。这就是 H2O 可以帮助您的地方。它是一个开源机器学习框架,其中包含了多种被广泛接受的 ML 算法的经过全面测试的实现。您只需要从其庞大的存储库中选取该算法,然后将其应用到您的数据集。它包含使用最广泛的统计和 ML 算法。

Likewise, you may try multiple algorithms on the same dataset and then pick up the best one that satisfactorily meets the customer’s requirements. This is where H2O comes to your rescue. It is an open source Machine Learning framework with full-tested implementations of several widely-accepted ML algorithms. You just have to pick up the algorithm from its huge repository and apply it to your dataset. It contains the most widely used statistical and ML algorithms.

这里提到了几个,其中包括梯度提升机 (GBM)、广义线性模型 (GLM)、深度学习等等。不仅如此,它还支持 AutoML 功能,该功能将对您的数据集中的不同算法的性能进行排名,从而减少您为找到性能最佳的模型所做的努力。H2O 被全球 18,000 多家组织使用,并且能很好地与 R 和 Python 对接,方便您的开发。它是一个内存平台,提供卓越的性能。

To mention a few here it includes gradient boosted machines (GBM), generalized linear model (GLM), deep learning and many more. Not only that it also supports AutoML functionality that will rank the performance of different algorithms on your dataset, thus reducing your efforts of finding the best performing model. H2O is used worldwide by more than 18000 organizations and interfaces well with R and Python for your ease of development. It is an in-memory platform that provides superb performance.

在本教程中,您将首先学习如何在计算机上安装 H2O,同时使用 Python 和 R 选项。我们将了解如何使用命令行,以便逐行了解其工作原理。如果您是一名 Python 爱好者,您可以使用 Jupyter 或任何其他您选择的 IDE 来开发 H2O 应用程序。如果您更喜欢 R,可以使用 RStudio 进行开发。

In this tutorial, you will first learn to install the H2O on your machine with both Python and R options. We will understand how to use this in the command line so that you understand its working line-wise. If you are a Python lover, you may use Jupyter or any other IDE of your choice for developing H2O applications. If you prefer R, you may use RStudio for development.

在本教程中,我们将考虑一个示例来了解如何使用 H2O。我们还将学习如何在程序代码中更改算法,并将其性能与之前的算法进行比较。H2O 还提供了一个基于 Web 的工具来测试您数据集上的不同算法。这称为 Flow。

In this tutorial, we will consider an example to understand how to go about working with H2O. We will also learn how to change the algorithm in your program code and compare its performance with the earlier one. The H2O also provides a web-based tool to test the different algorithms on your dataset. This is called Flow.

本教程将向您介绍 Flow 的使用。与此同时,我们将讨论 AutoML 的使用,该功能将识别您数据集上性能最佳的算法。您是不是很兴奋要学习 H2O?继续阅读!

The tutorial will introduce you to the use of Flow. Alongside, we will discuss the use of AutoML that will identify the best performing algorithm on your dataset. Are you not excited to learn H2O? Keep reading!