Map Reduce 简明教程

MapReduce - Hadoop Administration

本章介绍了 Hadoop 管理,包括 HDFS 和 MapReduce 管理。

This chapter explains Hadoop administration which includes both HDFS and MapReduce administration.

  1. HDFS administration includes monitoring the HDFS file structure, locations, and the updated files.

  2. MapReduce administration includes monitoring the list of applications, configuration of nodes, application status, etc.

HDFS Monitoring

HDFS(Hadoop 分布式文件系统)包含用户目录、输入文件和输出文件。使用 MapReduce 命令 putget, 进行存储和检索。

HDFS (Hadoop Distributed File System) contains the user directories, input files, and output files. Use the MapReduce commands, put and get, for storing and retrieving.

通过在“/$HADOOP_HOME/sbin”上发送命令“start-all.sh”启动 Hadoop 框架(守护进程)后,将以下 URL 发送到浏览器“http://localhost:50070”。您应该在浏览器上看到以下屏幕。

After starting the Hadoop framework (daemons) by passing the command “start-all.sh” on “/$HADOOP_HOME/sbin”, pass the following URL to the browser “http://localhost:50070”. You should see the following screen on your browser.

以下屏幕截图显示了如何浏览 HDFS。

The following screenshot shows how to browse the browse HDFS.

hdfs monitoring

以下屏幕截图显示了 HDFS 的文件结构。此图片显示了“/user/hadoop”目录中的文件。

The following screenshot show the file structure of HDFS. It shows the files in the “/user/hadoop” directory.

files in hdfs

以下屏幕截图显示了集群中的数据节点信息。在此您可以找到一个节点及其配置和容量。

The following screenshot shows the Datanode information in a cluster. Here you can find one node with its configurations and capacities.

datanode info

MapReduce Job Monitoring

MapReduce 应用程序是一组作业(映射作业、合并器、分区器、还原作业)。有必要监控并维护以下内容:

A MapReduce application is a collection of jobs (Map job, Combiner, Partitioner, and Reduce job). It is mandatory to monitor and maintain the following −

  1. Configuration of datanode where the application is suitable.

  2. The number of datanodes and resources used per application.

为监控所有这些事情,应该必须有一个用户界面。在 “/$HADOOP_HOME/sbin”上传递 “start-all.sh”来开始 Hadoop 框架,并发送以下 URL 到浏览器 “http://localhost:8080”。您应在浏览器上看到以下屏幕。

To monitor all these things, it is imperative that we should have a user interface. After starting the Hadoop framework by passing the command “start-all.sh” on “/$HADOOP_HOME/sbin”, pass the following URL to the browser “http://localhost:8080”. You should see the following screen on your browser.

job monitoring

在上方的屏幕截图中,手点针在 application ID 上。只需点击它即可在浏览器上找到以下屏幕。它说明了以下内容:

In the above screenshot, the hand pointer is on the application ID. Just click on it to find the following screen on your browser. It describes the following −

  1. On which user the current application is running

  2. The application name

  3. Type of that application

  4. Current status, Final status

  5. Application started time, elapsed (completed time), if it is complete at the time of monitoring

  6. The history of this application, i.e., log information

  7. And finally, the node information, i.e., the nodes that participated in running the application.

以下屏幕截图显示了特定应用程序的详细信息−

The following screenshot shows the details of a particular application −

application id

以下屏幕截图描述了当前正在运行的节点信息。此处,屏幕截图仅包含一个节点。手形指针显示正在运行的节点的本地主机地址。

The following screenshot describes the currently running nodes information. Here, the screenshot contains only one node. A hand pointer shows the localhost address of the running node.

all nodes