Mahout 简明教程

Mahout - Recommendation

本章涵盖了流行的机器学习技术 recommendation, 、其机制,以及如何编写实现 Mahout 推荐的应用程序。

This chapter covers the popular machine learning technique called recommendation, its mechanisms, and how to write an application implementing Mahout recommendation.

Recommendation

您是否曾想过亚马逊如何想出推荐的一系列商品,以此吸引您注意您可能感兴趣的特定产品!

Ever wondered how Amazon comes up with a list of recommended items to draw your attention to a particular product that you might be interested in!

假设您想从亚马逊购买“Mahout in Action”这本书:

Suppose you want to purchase the book “Mahout in Action” from Amazon:

mahout in action

除了所选产品,亚马逊还显示了一系列相关推荐商品,如下所示。

Along with the selected product, Amazon also displays a list of related recommended items, as shown below.

items

此类推荐列表借助于 recommender engines 生成。Mahout 提供了几种类型的推荐引擎,例如:

Such recommendation lists are produced with the help of recommender engines. Mahout provides recommender engines of several types such as:

  1. user-based recommenders,

  2. item-based recommenders, and

  3. several other algorithms.

Mahout Recommender Engine

Mahout 拥有一个非分布式、非基于 Hadoop 的推荐引擎。您应该传递一个包含用户对商品首选项的文本文档。此引擎的输出将是特定用户对其他商品的估计首选项。

Mahout has a non-distributed, non-Hadoop-based recommender engine. You should pass a text document having user preferences for items. And the output of this engine would be the estimated preferences of a particular user for other items.

Example

考虑一个销售消费商品(例如手机、小工具及其配件)的网站。如果我们希望在这样一个网站中实现 Mahout 的功能,那么我们可以构建一个推荐引擎。此引擎会分析过去用户的购买数据,然后基于其推荐新产品。

Consider a website that sells consumer goods such as mobiles, gadgets, and their accessories. If we want to implement the features of Mahout in such a site, then we can build a recommender engine. This engine analyzes past purchase data of the users and recommends new products based on that.

Mahout 提供的构建推荐引擎的组件如下:

The components provided by Mahout to build a recommender engine are as follows:

  1. DataModel

  2. UserSimilarity

  3. ItemSimilarity

  4. UserNeighborhood

  5. Recommender

从数据存储,准备数据模型并将其作为输入传递给推荐引擎。推荐引擎为特定用户生成推荐。以下是推荐引擎的架构。

From the data store, the data model is prepared and is passed as an input to the recommender engine. The Recommender engine generates the recommendations for a particular user. Given below is the architecture of recommender engine.

Architecture of Recommender Engine

recommender engine

Building a Recommender using Mahout

以下是开发一个简单推荐引擎的步骤:

Here are the steps to develop a simple recommender:

Step1: Create DataModel Object

PearsonCorrelationSimilarity 类的构造函数需要一个数据模型对象,该对象包含一个文件,其中包含产品的用户、商品和首选项详细信息。以下为数据模型文件示例:

The constructor of PearsonCorrelationSimilarity class requires a data model object, which holds a file that contains the Users, Items, and Preferences details of a product. Here is the sample data model file:

1,00,1.0
1,01,2.0
1,02,5.0
1,03,5.0
1,04,5.0

2,00,1.0
2,01,2.0
2,05,5.0
2,06,4.5
2,02,5.0

3,01,2.5
3,02,5.0
3,03,4.0
3,04,3.0

4,00,5.0
4,01,5.0
4,02,5.0
4,03,0.0

DataModel 对象需要文件对象,其中包含输入文件的路径。如下所示,创建 DataModel 对象。

The DataModel object requires the file object, which contains the path of the input file. Create the DataModel object as shown below.

DataModel datamodel = new FileDataModel(new File("input file"));

Step2: Create UserSimilarity Object

使用 PearsonCorrelationSimilarity 如下所示创建 UserSimilarity 对象:

Create UserSimilarity object using PearsonCorrelationSimilarity class as shown below:

UserSimilarity similarity = new PearsonCorrelationSimilarity(datamodel);

Step3: Create UserNeighborhood object

此对象计算用户相对于给定用户的“邻域”。有两种类型的邻域:

This object computes a "neighborhood" of users like a given user. There are two types of neighborhoods:

  1. NearestNUserNeighborhood - This class computes a neighborhood consisting of the nearest n users to a given user. "Nearest" is defined by the given UserSimilarity.

  2. ThresholdUserNeighborhood - This class computes a neighborhood consisting of all the users whose similarity to the given user meets or exceeds a certain threshold. Similarity is defined by the given UserSimilarity.

这里我们使用 ThresholdUserNeighborhood 并将偏好限制设置为 3.0。

Here we are using ThresholdUserNeighborhood and set the limit of preference to 3.0.

UserNeighborhood neighborhood = new ThresholdUserNeighborhood(3.0, similarity, model);

Step4: Create Recommender Object

创建 UserbasedRecomender 对象。将上述创建的所有对象传递给它的构造函数,如下所示。

Create UserbasedRecomender object. Pass all the above created objects to its constructor as shown below.

UserBasedRecommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);

Step5: Recommend Items to a User

使用 Recommender 界面的 recommend() 方法向用户推荐产品。该方法需要两个参数。第一个表示要向其发送推荐的用户 ID,第二个表示要发送的推荐数。以下是 recommender() 方法的使用方法:

Recommend products to a user using the recommend() method of Recommender interface. This method requires two parameters. The first represents the user id of the user to whom we need to send the recommendations, and the second represents the number of recommendations to be sent. Here is the usage of recommender() method:

List<RecommendedItem> recommendations = recommender.recommend(2, 3);

for (RecommendedItem recommendation : recommendations) {
   System.out.println(recommendation);
 }

Example Program

Example Program

下面是一个设置推荐的示例程序。为用户 ID 为 2 的用户准备推荐。

Given below is an example program to set recommendation. Prepare the recommendations for the user with user id 2.

import java.io.File;
import java.util.List;

import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;

import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;

import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.UserBasedRecommender;

import org.apache.mahout.cf.taste.similarity.UserSimilarity;

public class Recommender {
   public static void main(String args[]){
      try{
         //Creating data model
         DataModel datamodel = new FileDataModel(new File("data")); //data

         //Creating UserSimilarity object.
         UserSimilarity usersimilarity = new PearsonCorrelationSimilarity(datamodel);

         //Creating UserNeighbourHHood object.
         UserNeighborhood userneighborhood = new ThresholdUserNeighborhood(3.0, usersimilarity, datamodel);

         //Create UserRecomender
         UserBasedRecommender recommender = new GenericUserBasedRecommender(datamodel, userneighborhood, usersimilarity);

         List<RecommendedItem> recommendations = recommender.recommend(2, 3);

         for (RecommendedItem recommendation : recommendations) {
            System.out.println(recommendation);
         }

      }catch(Exception e){}

   }
  }

使用以下命令编译程序:

Compile the program using the following commands:

javac Recommender.java
java Recommender

它应生成以下输出:

It should produce the following output:

RecommendedItem [item:3, value:4.5]
RecommendedItem [item:4, value:4.0]