Hadoop 简明教程
Hadoop - Multi-Node Cluster
本章介绍了在分布式环境中设置 Hadoop 多节点群集。
This chapter explains the setup of the Hadoop Multi-Node cluster on a distributed environment.
由于无法演示整个集群,我们使用三个系统(一个主设备和两个从设备)解释 Hadoop 集群环境;以下是它们的 IP 地址。
As the whole cluster cannot be demonstrated, we are explaining the Hadoop cluster environment using three systems (one master and two slaves); given below are their IP addresses.
-
Hadoop Master: 192.168.1.15 (hadoop-master)
-
Hadoop Slave: 192.168.1.16 (hadoop-slave-1)
-
Hadoop Slave: 192.168.1.17 (hadoop-slave-2)
按照以下步骤设置 Hadoop 多节点集群。
Follow the steps given below to have Hadoop Multi-Node cluster setup.
Installing Java
Java 是 Hadoop 的主要先决条件。首先,您应该使用“java -version”验证您的系统中是否存在 java。java 版本命令的语法如下。
Java is the main prerequisite for Hadoop. First of all, you should verify the existence of java in your system using “java -version”. The syntax of java version command is given below.
$ java -version
如果一切正常,它将为您提供以下输出。
If everything works fine it will give you the following output.
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b13)
Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)
如果您的系统中未安装 java,请按照以下步骤安装 java。
If java is not installed in your system, then follow the given steps for installing java.
Step 1
访问以下链接下载 java (JDK <最新版本> - X64.tar.gz): www.oracle.com
Download java (JDK <latest version> - X64.tar.gz) by visiting the following link link: www.oracle.com
然后 jdk-7u71-linux-x64.tar.gz 将下载到您的系统中。
Then jdk-7u71-linux-x64.tar.gz will be downloaded into your system.
Step 2
通常,您将在“下载”文件夹中找到下载的 java 文件。使用以下命令对其进行验证并解压缩 jdk-7u71-linux-x64.gz 文件。
Generally you will find the downloaded java file in Downloads folder. Verify it and extract the jdk-7u71-linux-x64.gz file using the following commands.
$ cd Downloads/
$ ls
jdk-7u71-Linux-x64.gz
$ tar zxf jdk-7u71-Linux-x64.gz
$ ls
jdk1.7.0_71 jdk-7u71-Linux-x64.gz
Step 3
要使所有用户都能使用 java,您必须将其移至 “/usr/local/”的位置。打开 root 并键入以下命令。
To make java available to all the users, you have to move it to the location “/usr/local/”. Open the root, and type the following commands.
$ su
password:
# mv jdk1.7.0_71 /usr/local/
# exit
Step 4
为设置 PATH 和 JAVA_HOME 变量,将以下命令添加到 ~/.bashrc 文件。
For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc file.
export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH=PATH:$JAVA_HOME/bin
现在,如上所述,从终端验证 java -version 命令。按照上述过程在所有集群节点中安装 java。
Now verify the java -version command from the terminal as explained above. Follow the above process and install java in all your cluster nodes.
Creating User Account
在主设备和从设备上创建一个系统用户帐户以使用 Hadoop 安装。
Create a system user account on both master and slave systems to use the Hadoop installation.
# useradd hadoop
# passwd hadoop
Mapping the nodes
您必须在所有节点上的 /etc/ 文件夹中编辑 hosts 文件,指定每个系统的 IP 地址以及它们的主机名。
You have to edit hosts file in /etc/ folder on all nodes, specify the IP address of each system followed by their host names.
# vi /etc/hosts
enter the following lines in the /etc/hosts file.
192.168.1.109 hadoop-master
192.168.1.145 hadoop-slave-1
192.168.56.1 hadoop-slave-2
Configuring Key Based Login
在每个节点中设置 ssh,以便它们无需任何密码提示即可相互通信。
Setup ssh in every node such that they can communicate with one another without any prompt for password.
# su hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/id_rsa.pub tutorialspoint@hadoop-master
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2
$ chmod 0600 ~/.ssh/authorized_keys
$ exit
Installing Hadoop
在主设备服务器中,使用以下命令下载并安装 Hadoop。
In the Master server, download and install Hadoop using the following commands.
# mkdir /opt/hadoop
# cd /opt/hadoop/
# wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.0.tar.gz
# tar -xzf hadoop-1.2.0.tar.gz
# mv hadoop-1.2.0 hadoop
# chown -R hadoop /opt/hadoop
# cd /opt/hadoop/hadoop/
Configuring Hadoop
您必须通过进行如下所示的更改将 Hadoop 服务器进行配置。
You have to configure Hadoop server by making the following changes as given below.
core-site.xml
打开 core-site.xml 文件,并按如下所示进行编辑。
Open the core-site.xml file and edit it as shown below.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:9000/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
hdfs-site.xml
打开 hdfs-site.xml 文件,并按如下所示进行编辑。
Open the hdfs-site.xml file and edit it as shown below.
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/hadoop/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop/hadoop/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml
打开 mapred-site.xml 文件,并按如下所示进行编辑。
Open the mapred-site.xml file and edit it as shown below.
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop-master:9001</value>
</property>
</configuration>
hadoop-env.sh
打开 hadoop-env.sh 文件,并按如下所示编辑 JAVA_HOME、HADOOP_CONF_DIR 和 HADOOP_OPTS。
Open the hadoop-env.sh file and edit JAVA_HOME, HADOOP_CONF_DIR, and HADOOP_OPTS as shown below.
Note − 根据您的系统配置设置 JAVA_HOME。
Note − Set the JAVA_HOME as per your system configuration.
export JAVA_HOME=/opt/jdk1.7.0_17
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export HADOOP_CONF_DIR=/opt/hadoop/hadoop/conf
Installing Hadoop on Slave Servers
按照提供的命令,在所有从属服务器上安装Hadoop。
Install Hadoop on all the slave servers by following the given commands.
# su hadoop
$ cd /opt/hadoop
$ scp -r hadoop hadoop-slave-1:/opt/hadoop
$ scp -r hadoop hadoop-slave-2:/opt/hadoop
Configuring Hadoop on Master Server
打开主服务器并按照给定的命令配置它。
Open the master server and configure it by following the given commands.
# su hadoop
$ cd /opt/hadoop/hadoop
Format Name Node on Hadoop Master
# su hadoop
$ cd /opt/hadoop/hadoop
$ bin/hadoop namenode –format
11/10/14 10:58:07 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hadoop-master/192.168.1.109
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.0
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473;
compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013
STARTUP_MSG: java = 1.7.0_71
************************************************************/
11/10/14 10:58:08 INFO util.GSet: Computing capacity for map BlocksMap
editlog=/opt/hadoop/hadoop/dfs/name/current/edits
………………………………………………….
………………………………………………….
………………………………………………….
11/10/14 10:58:08 INFO common.Storage: Storage directory
/opt/hadoop/hadoop/dfs/name has been successfully formatted.
11/10/14 10:58:08 INFO namenode.NameNode:
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/192.168.1.15
************************************************************/
Starting Hadoop Services
以下命令用于在Hadoop主服务器上启动所有Hadoop服务。
The following command is to start all the Hadoop services on the Hadoop-Master.
$ cd $HADOOP_HOME/sbin
$ start-all.sh
Adding a New DataNode in the Hadoop Cluster
以下是为Hadoop集群添加新节点的步骤。
Given below are the steps to be followed for adding new nodes to a Hadoop cluster.
Adding User and SSH Access
Add a User
在新节点上,添加“hadoop”用户并使用以下命令将Hadoop用户的密码设置为“hadoop123”或您想要的任何密码。
On a new node, add "hadoop" user and set password of Hadoop user to "hadoop123" or anything you want by using the following commands.
useradd hadoop
passwd hadoop
建立从主节点到新从属节点的非密码连接。
Setup Password less connectivity from master to new slave.
Execute the following on the master
mkdir -p $HOME/.ssh
chmod 700 $HOME/.ssh
ssh-keygen -t rsa -P '' -f $HOME/.ssh/id_rsa
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
chmod 644 $HOME/.ssh/authorized_keys
Copy the public key to new slave node in hadoop user $HOME directory
scp $HOME/.ssh/id_rsa.pub hadoop@192.168.1.103:/home/hadoop/
Execute the following on the slaves
登录到Hadoop。如果没有,请登录到Hadoop用户。
Login to hadoop. If not, login to hadoop user.
su hadoop ssh -X hadoop@192.168.1.103
将公钥的内容复制到文件 "$HOME/.ssh/authorized_keys" 中,然后通过执行以下命令更改相同文件的权限。
Copy the content of public key into file "$HOME/.ssh/authorized_keys" and then change the permission for the same by executing the following commands.
cd $HOME
mkdir -p $HOME/.ssh
chmod 700 $HOME/.ssh
cat id_rsa.pub >>$HOME/.ssh/authorized_keys
chmod 644 $HOME/.ssh/authorized_keys
从主计算机检查ssh登录。现在,检查是否可以从主节点通过ssh访问新节点,而无需密码。
Check ssh login from the master machine. Now check if you can ssh to the new node without a password from the master.
ssh hadoop@192.168.1.103 or hadoop@slave3
Set Hostname of New Node
您可以在文件 /etc/sysconfig/network 中设置主机名。
You can set hostname in file /etc/sysconfig/network
On new slave3 machine
NETWORKING = yes
HOSTNAME = slave3.in
为了使更改生效,请重新启动计算机或使用各自的主机名对新计算机运行主机名命令(重新启动是一个好选择)。
To make the changes effective, either restart the machine or run hostname command to a new machine with the respective hostname (restart is a good option).
在slave3节点计算机上−
On slave3 node machine −
主机名slave3.in
hostname slave3.in
使用以下行更新群集所有计算机上的 /etc/hosts −
Update /etc/hosts on all machines of the cluster with the following lines −
192.168.1.102 slave3.in slave3
现在尝试使用主机名ping计算机,以检查是否解析为IP。
Now try to ping the machine with hostnames to check whether it is resolving to IP or not.
在新节点计算机上−
On new node machine −
ping master.in
Start the DataNode on New Node
使用 $HADOOP_HOME/bin/hadoop-daemon.sh script 手动启动dataNode守护程序。它将自动联系主(NameNode)并加入群集。我们还应该将新节点添加到主服务器中的conf/slaves文件中。基于脚本的命令将识别新的节点。
Start the datanode daemon manually using $HADOOP_HOME/bin/hadoop-daemon.sh script. It will automatically contact the master (NameNode) and join the cluster. We should also add the new node to the conf/slaves file in the master server. The script-based commands will recognize the new node.
Removing a DataNode from the Hadoop Cluster
在运行时,我们可以在不丢失任何数据的情况下从集群中删除一个节点。HDFS 提供了一个解除配置的功能,它确保以安全的方式删除节点。要使用该功能,请按照以下步骤操作:
We can remove a node from a cluster on the fly, while it is running, without any data loss. HDFS provides a decommissioning feature, which ensures that removing a node is performed safely. To use it, follow the steps as given below −
Step 1 − Login to master
登录到已安装 Hadoop 的主计算机用户。
Login to master machine user where Hadoop is installed.
$ su hadoop
Step 2 − Change cluster configuration
必须在启动集群之前配置一个排除文件。向我们的 ` $HADOOP_HOME/etc/hadoop/hdfs-site.xml ` 文件中添加一个名为 dfs.hosts.exclude
的键。与此键关联的值提供了一个文件在 NameNode 本地文件系统上的完整路径,此文件包含不被允许连接到 HDFS 的计算机列表。
An exclude file must be configured before starting the cluster. Add a key named dfs.hosts.exclude to our $HADOOP_HOME/etc/hadoop/hdfs-site.xml file. The value associated with this key provides the full path to a file on the NameNode’s local file system which contains a list of machines which are not permitted to connect to HDFS.
例如,将这些行添加到 ` etc/hadoop/hdfs-site.xml ` 文件。
For example, add these lines to etc/hadoop/hdfs-site.xml file.
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hadoop/hadoop-1.2.1/hdfs_exclude.txt</value>
<description>DFS exclude</description>
</property>
Step 3 − Determine hosts to decommission
每个需要解除配置的计算机都应添加到 hdfs_exclude.txt
标识的文件中,每行一个域名。这会阻止它们连接到 NameNode。如果你想删除 DataNode2,` "/home/hadoop/hadoop-1.2.1/hdfs_exclude.txt" ` 文件的内容如下所示:
Each machine to be decommissioned should be added to the file identified by the hdfs_exclude.txt, one domain name per line. This will prevent them from connecting to the NameNode. Content of the "/home/hadoop/hadoop-1.2.1/hdfs_exclude.txt" file is shown below, if you want to remove DataNode2.
slave2.in
Step 4 − Force configuration reload
在不加引号的情况下运行命令 ` "$HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes" `。
Run the command "$HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes" without the quotes.
$ $HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes
这将强制 NameNode 重新读取其配置,包括新更新的“排除”文件。它会在一段时间内解除配置节点,从而为每个节点的块复制到计划保持活动状态的计算机上提供时间。
This will force the NameNode to re-read its configuration, including the newly updated ‘excludes’ file. It will decommission the nodes over a period of time, allowing time for each node’s blocks to be replicated onto machines which are scheduled to remain active.
在 ` slave2.in ` 中,检查 jps
命令输出。一段时间后,你将看到 DataNode 进程自动关闭。
On slave2.in, check the jps command output. After some time, you will see the DataNode process is shutdown automatically.
Step 5 − Shutdown nodes
解除配置过程完成后,可以安全地关闭已解除配置的硬件以进行维护。运行 dfsadmin
报告命令以检查解除配置的状态。以下命令将描述解除配置节点的状态和连接到该集群的节点。
After the decommission process has been completed, the decommissioned hardware can be safely shut down for maintenance. Run the report command to dfsadmin to check the status of decommission. The following command will describe the status of the decommission node and the connected nodes to the cluster.
$ $HADOOP_HOME/bin/hadoop dfsadmin -report
Step 6 − Edit excludes file again
机器解除配置后,可以从“排除”文件中删除它们。再次运行 ` "$HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes" ` 将把排除文件读回到 NameNode 中;允许 DataNode 在维护完成后重新加入该集群,或者在该集群再次需要额外容量时重新加入该集群,等等。
Once the machines have been decommissioned, they can be removed from the ‘excludes’ file. Running "$HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes" again will read the excludes file back into the NameNode; allowing the DataNodes to rejoin the cluster after the maintenance has been completed, or additional capacity is needed in the cluster again, etc.
` Special Note ` - 如果遵循以上过程,并且 tasktracker 进程仍在该节点上运行,则需要将其关闭。一种方法是像在上述步骤中所做的那样断开计算机连接。主计算机将自动识别该进程并宣布该进程已死亡。无需遵循相同过程来删除 tasktracker,因为它不如 DataNode 那样重要。DataNode 包含你要在不丢失任何数据的情况下安全删除的数据。
Special Note − If the above process is followed and the tasktracker process is still running on the node, it needs to be shut down. One way is to disconnect the machine as we did in the above steps. The Master will recognize the process automatically and will declare as dead. There is no need to follow the same process for removing the tasktracker because it is NOT much crucial as compared to the DataNode. DataNode contains the data that you want to remove safely without any loss of data.
可以通过在任何时间点使用以下命令动态运行/关闭 tasktracker。
The tasktracker can be run/shutdown on the fly by the following command at any point of time.
$ $HADOOP_HOME/bin/hadoop-daemon.sh stop tasktracker
$HADOOP_HOME/bin/hadoop-daemon.sh start tasktracker