Big Data Analytics 简明教程

Machine Learning for Data Analysis

机器学习是计算机科学的一个子领域,它涉及图案识别、计算机视觉、语音识别、文本分析等任务,并且与统计学和数学优化紧密相关。应用程序包括搜索引擎开发、垃圾邮件过滤、光学字符识别 (OCR) 等。数据挖掘、模式识别和统计学习领域之间的界限并不清晰,而且基本上都涉及类似问题。

Machine learning is a subfield of computer science that deals with tasks such as pattern recognition, computer vision, speech recognition, text analytics and has a strong link with statistics and mathematical optimization. Applications include the development of search engines, spam filtering, Optical Character Recognition (OCR) among others. The boundaries between data mining, pattern recognition and the field of statistical learning are not clear and basically all refer to similar problems.

机器学习可以分为两种类型的任务 −

Machine learning can be divided in two types of task −

  1. Supervised Learning

  2. Unsupervised Learning

Supervised Learning

监督学习是指一种问题,其中输入数据定义为矩阵 X,并且我们有兴趣预测响应 y。其中,X = {x1, x2, …, xn} 有 n 个预测变量并有 y = {c1, c2} 两个值。

Supervised learning refers to a type of problem where there is an input data defined as a matrix X and we are interested in predicting a response y. Where X = {x1, x2, …, xn} has n predictors and has two values y = {c1, c2}.

一个示例应用程序是使用人口特征作为预测变量来预测网络用户点击广告的可能性。这通常称为预测点击率 (CTR)。然后,y = {点击,不点击},并且预测变量可能是用户的 IP 地址、他进入网站的那天、该用户的城市、国家以及可能的其他功能。

An example application would be to predict the probability of a web user to click on ads using demographic features as predictors. This is often called to predict the click through rate (CTR). Then y = {click, doesn’t − click} and the predictors could be the used IP address, the day he entered the site, the user’s city, country among other features that could be available.

Unsupervised Learning

无监督学习解决的问题是查找组彼此相似,而无需学习课程。对于学习映射任务,有几种方法,从预测变量到查找每个组中共享相似实例并彼此不同的组。

Unsupervised learning deals with the problem of finding groups that are similar within each other without having a class to learn from. There are several approaches to the task of learning a mapping from predictors to finding groups that share similar instances in each group and are different with each other.

无监督学习的一个示例应用是客户细分。例如,在电信行业,一项常见任务是根据用户使用手机的情况对用户进行细分。这将允许营销部门针对每组提供不同的产品。

An example application of unsupervised learning is customer segmentation. For example, in the telecommunications industry a common task is to segment users according to the usage they give to the phone. This would allow the marketing department to target each group with a different product.