Map Reduce 简明教程

MapReduce - Hadoop Administration

This chapter explains Hadoop administration which includes both HDFS and MapReduce administration.

  1. HDFS administration includes monitoring the HDFS file structure, locations, and the updated files.

  2. MapReduce administration includes monitoring the list of applications, configuration of nodes, application status, etc.

HDFS Monitoring

HDFS (Hadoop Distributed File System) contains the user directories, input files, and output files. Use the MapReduce commands, put and get, for storing and retrieving.

After starting the Hadoop framework (daemons) by passing the command “start-all.sh” on “/$HADOOP_HOME/sbin”, pass the following URL to the browser “http://localhost:50070”. You should see the following screen on your browser.

The following screenshot shows how to browse the browse HDFS.

hdfs monitoring

The following screenshot show the file structure of HDFS. It shows the files in the “/user/hadoop” directory.

files in hdfs

The following screenshot shows the Datanode information in a cluster. Here you can find one node with its configurations and capacities.

datanode info

MapReduce Job Monitoring

A MapReduce application is a collection of jobs (Map job, Combiner, Partitioner, and Reduce job). It is mandatory to monitor and maintain the following −

  1. Configuration of datanode where the application is suitable.

  2. The number of datanodes and resources used per application.

To monitor all these things, it is imperative that we should have a user interface. After starting the Hadoop framework by passing the command “start-all.sh” on “/$HADOOP_HOME/sbin”, pass the following URL to the browser “http://localhost:8080”. You should see the following screen on your browser.

job monitoring

In the above screenshot, the hand pointer is on the application ID. Just click on it to find the following screen on your browser. It describes the following −

  1. On which user the current application is running

  2. The application name

  3. Type of that application

  4. Current status, Final status

  5. Application started time, elapsed (completed time), if it is complete at the time of monitoring

  6. 该应用程序的历史记录,即日志信息

  7. 最后,是节点信息,即参与运行应用程序的节点。

以下屏幕截图显示了特定应用程序的详细信息−

application id

以下屏幕截图描述了当前正在运行的节点信息。此处,屏幕截图仅包含一个节点。手形指针显示正在运行的节点的本地主机地址。

all nodes