Apache Spark 简明教程

Apache Spark - Installation

Spark 是 Hadoop 的子项目。因此，最好将 Spark 安装到基于 Linux 的系统中。以下步骤显示如何安装 Apache Spark。

Spark is Hadoop’s sub-project. Therefore, it is better to install Spark into a Linux based system. The following steps show how to install Apache Spark.

Step 1: Verifying Java Installation

Java 安装是安装 Spark 的必备事项之一。尝试以下命令以验证 JAVA 版本。

Java installation is one of the mandatory things in installing Spark. Try the following command to verify the JAVA version.

$java -version

如果系统中已安装 Java，您将看到以下响应：

If Java is already, installed on your system, you get to see the following response −

java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b13)
Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)

如果您系统中尚未安装 Java，请在继续下一步操作之前安装 Java。

In case you do not have Java installed on your system, then Install Java before proceeding to next step.

Step 2: Verifying Scala installation

您应使用 Scala 语言来实现 Spark。因此，让我们使用以下命令验证 Scala 安装。

You should Scala language to implement Spark. So let us verify Scala installation using following command.

$scala -version

如果系统中已安装 Scala，您将看到以下响应：

If Scala is already installed on your system, you get to see the following response −

Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL

如果您系统中尚未安装 Scala，则继续执行下一步操作以安装 Scala。

In case you don’t have Scala installed on your system, then proceed to next step for Scala installation.

Step 3: Downloading Scala

下载 Scala 的最新版本，请访问以下链接 Download Scala 。对于本教程，我们使用 scala-2.11.6 版本。下载后，您将在下载文件夹中找到 Scala tar 文件。

Download the latest version of Scala by visit the following link Download Scala. For this tutorial, we are using scala-2.11.6 version. After downloading, you will find the Scala tar file in the download folder.

Step 4: Installing Scala

按照以下给定的步骤安装 Scala。

Follow the below given steps for installing Scala.

Extract the Scala tar file

键入以下命令以解压 Scala tar 文件。

Type the following command for extracting the Scala tar file.

$ tar xvf scala-2.11.6.tgz

Move Scala software files

使用以下命令将 Scala 软件文件移动到相应的目录 (/usr/local/scala) 。

Use the following commands for moving the Scala software files, to respective directory (/usr/local/scala).

$ su –
Password:
# cd /home/Hadoop/Downloads/
# mv scala-2.11.6 /usr/local/scala
# exit

Set PATH for Scala

使用以下命令为 Scala 设置 PATH。

Use the following command for setting PATH for Scala.

$ export PATH = $PATH:/usr/local/scala/bin

Verifying Scala Installation

安装后，最好对其进行验证。使用以下命令验证 Scala 安装。

After installation, it is better to verify it. Use the following command for verifying Scala installation.

$scala -version

如果系统中已安装 Scala，您将看到以下响应：

If Scala is already installed on your system, you get to see the following response −

Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL

Step 5: Downloading Apache Spark

访问以下链接 Download Spark 下载 Spark 最新版本。本教程中，我们使用 spark-1.3.1-bin-hadoop2.6 版本。下载后，你将在下载文件夹中找到 Spark tar 文件。

Download the latest version of Spark by visiting the following link Download Spark. For this tutorial, we are using spark-1.3.1-bin-hadoop2.6 version. After downloading it, you will find the Spark tar file in the download folder.

Step 6: Installing Spark

按照以下步骤执行 Spark 安装。

Follow the steps given below for installing Spark.

Extracting Spark tar

以下命令用于解压 spark tar 文件。

The following command for extracting the spark tar file.

$ tar xvf spark-1.3.1-bin-hadoop2.6.tgz

Moving Spark software files

以下命令用于将 Spark 软件文件移动到相应的目录 (/usr/local/spark) 。

The following commands for moving the Spark software files to respective directory (/usr/local/spark).

$ su –
Password:

# cd /home/Hadoop/Downloads/
# mv spark-1.3.1-bin-hadoop2.6 /usr/local/spark
# exit

Setting up the environment for Spark

将以下行添加到 ~ /.bashrc 文件中。这意味着将 Spark 软件文件所在的路径添加到 PATH 变量。

Add the following line to ~/.bashrc file. It means adding the location, where the spark software file are located to the PATH variable.

export PATH=$PATH:/usr/local/spark/bin

使用以下命令加载 ~/.bashrc 文件。

Use the following command for sourcing the ~/.bashrc file.

$ source ~/.bashrc

Step 7: Verifying the Spark Installation

输入以下命令打开 Spark shell。

Write the following command for opening Spark shell.

$spark-shell

如果 Spark 安装成功，你将看到以下输出。

If spark is installed successfully then you will find the following output.

Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/06/04 15:25:22 INFO SecurityManager: Changing view acls to: hadoop
15/06/04 15:25:22 INFO SecurityManager: Changing modify acls to: hadoop
15/06/04 15:25:22 INFO SecurityManager: SecurityManager: authentication disabled;
   ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/06/04 15:25:22 INFO HttpServer: Starting HTTP Server
15/06/04 15:25:23 INFO Utils: Successfully started service 'HTTP class server' on port 43292.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.4.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Spark context available as sc
scala>