Apache Pig 简明教程
Apache Pig - Grunt Shell
调用 Grunt shell 后,您可以在 shell 中运行 Pig 脚本。此外,Grunt shell 还提供了一些有用的 shell 和实用程序命令。本章介绍了 Grunt shell 提供的 shell 和实用程序命令。
After invoking the Grunt shell, you can run your Pig scripts in the shell. In addition to that, there are certain useful shell and utility commands provided by the Grunt shell. This chapter explains the shell and utility commands provided by the Grunt shell.
Note − 在本章的某些部分中,会使用 Load 和 Store 等命令。参考相应章节以获取有关它们的详细信息。
Note − In some portions of this chapter, the commands like Load and Store are used. Refer the respective chapters to get in-detail information on them.
Shell Commands
Apache Pig 的 Grunt shell 主要用于编写 Pig Latin 脚本。在编写脚本之前,我们可以使用 sh 和 fs 来调用任意 shell 命令。
The Grunt shell of Apache Pig is mainly used to write Pig Latin scripts. Prior to that, we can invoke any shell commands using sh and fs.
sh Command
使用 grunt shell 的 sh 命令,我们可以调用任意 shell 命令。使用 grunt shell 的 sh 命令,我们不能执行 shell 环境的一部分命令( ex − cd)。
Using sh command, we can invoke any shell commands from the Grunt shell. Using sh command from the Grunt shell, we cannot execute the commands that are a part of the shell environment (ex − cd).
Syntax
Syntax
下面给出了 sh 命令的语法:
Given below is the syntax of sh command.
grunt> sh shell command parameters
Example
我们可以使用 sh 选项,使用如下方式,从 grunts shell 调用 Linux shell 的 ls 命令。在这个例子中,它列出了 /pig/bin/ 目录中的文件。
We can invoke the ls command of Linux shell from the Grunt shell using the sh option as shown below. In this example, it lists out the files in the /pig/bin/ directory.
grunt> sh ls
pig
pig_1444799121955.log
pig.cmd
pig.py
fs Command
使用 fs 命令,我们可以从 grunt shell 调用任意 FsShell 命令。
Using the fs command, we can invoke any FsShell commands from the Grunt shell.
Syntax
Syntax
下面给出了 fs 命令的语法:
Given below is the syntax of fs command.
grunt> sh File System command parameters
Example
我们可以使用 fs 命令,从 grunt shell 调用 HDFS 的 ls 命令。在下面的例子中,它列出了 HDFS 根目录中的文件。
We can invoke the ls command of HDFS from the Grunt shell using fs command. In the following example, it lists the files in the HDFS root directory.
grunt> fs –ls
Found 3 items
drwxrwxrwx - Hadoop supergroup 0 2015-09-08 14:13 Hbase
drwxr-xr-x - Hadoop supergroup 0 2015-09-09 14:52 seqgen_data
drwxr-xr-x - Hadoop supergroup 0 2015-09-08 11:30 twitter_data
同样的,我们可以使用 fs 命令,从 grunt shell 调用所有其他文件系统 shell 命令。
In the same way, we can invoke all the other file system shell commands from the Grunt shell using the fs command.
Utility Commands
Grunt shell 提供了一组实用程序命令。这些命令包括 like clear, help, history, quit, 和 set 的实用程序命令;还有 exec, kill, and run 等控制 grunt shell 的 Pig 命令。下面描述了 grunt shell 提供的实用程序命令。
The Grunt shell provides a set of utility commands. These include utility commands such as clear, help, history, quit, and set; and commands such as exec, kill, and run to control Pig from the Grunt shell. Given below is the description of the utility commands provided by the Grunt shell.
clear Command
clear 命令用于清除 grunt shell 屏幕。
The clear command is used to clear the screen of the Grunt shell.
Syntax
Syntax
您可以使用 clear 命令清除 grunt shell 屏幕,如下所示:
You can clear the screen of the grunt shell using the clear command as shown below.
grunt> clear
help Command
help 命令给出 Pig 命令或 Pig 属性的列表。
The help command gives you a list of Pig commands or Pig properties.
Usage
Usage
您可以使用 help 命令获取 Pig 命令的列表,如下所示:
You can get a list of Pig commands using the help command as shown below.
grunt> help
Commands: <pig latin statement>; - See the PigLatin manual for details:
http://hadoop.apache.org/pig
File system commands:fs <fs arguments> - Equivalent to Hadoop dfs command:
http://hadoop.apache.org/common/docs/current/hdfs_shell.html
Diagnostic Commands:describe <alias>[::<alias] - Show the schema for the alias.
Inner aliases can be described as A::B.
explain [-script <pigscript>] [-out <path>] [-brief] [-dot|-xml]
[-param <param_name>=<pCram_value>]
[-param_file <file_name>] [<alias>] -
Show the execution plan to compute the alias or for entire script.
-script - Explain the entire script.
-out - Store the output into directory rather than print to stdout.
-brief - Don't expand nested plans (presenting a smaller graph for overview).
-dot - Generate the output in .dot format. Default is text format.
-xml - Generate the output in .xml format. Default is text format.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
alias - Alias to explain.
dump <alias> - Compute the alias and writes the results to stdout.
Utility Commands: exec [-param <param_name>=param_value] [-param_file <file_name>] <script> -
Execute the script with access to grunt environment including aliases.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
script - Script to be executed.
run [-param <param_name>=param_value] [-param_file <file_name>] <script> -
Execute the script with access to grunt environment.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
script - Script to be executed.
sh <shell command> - Invoke a shell command.
kill <job_id> - Kill the hadoop job specified by the hadoop job id.
set <key> <value> - Provide execution parameters to Pig. Keys and values are case sensitive.
The following keys are supported:
default_parallel - Script-level reduce parallelism. Basic input size heuristics used
by default.
debug - Set debug on or off. Default is off.
job.name - Single-quoted name for jobs. Default is PigLatin:<script name>
job.priority - Priority for jobs. Values: very_low, low, normal, high, very_high.
Default is normal stream.skippath - String that contains the path.
This is used by streaming any hadoop property.
help - Display this message.
history [-n] - Display the list statements in cache.
-n Hide line numbers.
quit - Quit the grunt shell.
history Command
这个命令显示了自调用 grunt shell 以来执行/使用的语句列表。
This command displays a list of statements executed / used so far since the Grunt sell is invoked.
Usage
Usage
假设我们在打开 grunt shell 以来执行了三条语句。
Assume we have executed three statements since opening the Grunt shell.
grunt> customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(',');
grunt> orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');
然后,使用 history 命令将产生以下输出。
Then, using the history command will produce the following output.
grunt> history
customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(',');
orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');
set Command
set 命令用于显示/分配给 Pig 中使用的键的值。
The set command is used to show/assign values to keys used in Pig.
Usage
Usage
使用这个命令,您可以将值设定到以下键中:
Using this command, you can set values to the following keys.
Key |
Description and values |
default_parallel |
You can set the number of reducers for a map job by passing any whole number as a value to this key. |
debug |
You can turn off or turn on the debugging freature in Pig by passing on/off to this key. |
job.name |
You can set the Job name to the required job by passing a string value to this key. |
job.priority |
You can set the job priority to a job by passing one of the following values to this key − very_lowlownormalhighvery_high |
stream.skippath |
For streaming, you can set the path from where the data is not to be transferred, by passing the desired path in the form of a string to this key. |
quit Command
您可以使用此命令退出 Grunt shell。
You can quit from the Grunt shell using this command.
Usage
Usage
如下所示退出 Grunt shell。
Quit from the Grunt shell as shown below.
grunt> quit
现在,我们来看看您可以在 Grunt shell 中控制 Apache Pig 的命令。
Let us now take a look at the commands using which you can control Apache Pig from the Grunt shell.
exec Command
使用 exec 命令,我们可以从 Grunt shell 中执行 Pig 脚本。
Using the exec command, we can execute Pig scripts from the Grunt shell.
Syntax
Syntax
下面是 exec 的实用工具命令的语法。
Given below is the syntax of the utility command exec.
grunt> exec [–param param_name = param_value] [–param_file file_name] [script]
Example
让我们假设 HDFS 的 /pig_data/ 目录中有一个名为 student.txt 的文件,其内容如下。
Let us assume there is a file named student.txt in the /pig_data/ directory of HDFS with the following content.
Student.txt
Student.txt
001,Rajiv,Hyderabad
002,siddarth,Kolkata
003,Rajesh,Delhi
而且,假设我们在 HDFS 的 /pig_data/ 目录中有一个名为 sample_script.pig 的脚本文件,其内容如下。
And, assume we have a script file named sample_script.pig in the /pig_data/ directory of HDFS with the following content.
Sample_script.pig
Sample_script.pig
student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',')
as (id:int,name:chararray,city:chararray);
Dump student;
现在,让我们使用 exec 命令从 Grunt shell 中执行上述脚本,如下所示。
Now, let us execute the above script from the Grunt shell using the exec command as shown below.
grunt> exec /sample_script.pig
Output
exec 命令执行 sample_script.pig 中的脚本。就像脚本中指示的那样,它将 student.txt 文件加载到 Pig 中,并向您显示以下内容的 Dump 运算符结果。
The exec command executes the script in the sample_script.pig. As directed in the script, it loads the student.txt file into Pig and gives you the result of the Dump operator displaying the following content.
(1,Rajiv,Hyderabad)
(2,siddarth,Kolkata)
(3,Rajesh,Delhi)
kill Command
您可以使用此命令从 Grunt shell 中终止作业。
You can kill a job from the Grunt shell using this command.
Syntax
Syntax
下面是 kill 命令的语法。
Given below is the syntax of the kill command.
grunt> kill JobId
Example
假设有一个正在运行的 Pig 作业,其 ID 为 Id_0055 ,您可以使用 kill 命令从 Grunt shell 中将其终止,如下所示。
Suppose there is a running Pig job having id Id_0055, you can kill it from the Grunt shell using the kill command, as shown below.
grunt> kill Id_0055
run Command
您可以使用 run 命令从 Grunt shell 中运行 Pig 脚本。
You can run a Pig script from the Grunt shell using the run command
Syntax
Syntax
以下给出了 run 命令的语法。
Given below is the syntax of the run command.
grunt> run [–param param_name = param_value] [–param_file file_name] script
Example
让我们假设 HDFS 的 /pig_data/ 目录中有一个名为 student.txt 的文件,其内容如下。
Let us assume there is a file named student.txt in the /pig_data/ directory of HDFS with the following content.
Student.txt
Student.txt
001,Rajiv,Hyderabad
002,siddarth,Kolkata
003,Rajesh,Delhi
而且,假设我们在本地文件系统中有一个名为 sample_script.pig 的脚本文件,其内容如下。
And, assume we have a script file named sample_script.pig in the local filesystem with the following content.
Sample_script.pig
Sample_script.pig
student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING
PigStorage(',') as (id:int,name:chararray,city:chararray);
现在,让我们使用运行命令如下所示从 Grunt shell 运行上面的脚本。
Now, let us run the above script from the Grunt shell using the run command as shown below.
grunt> run /sample_script.pig
你可以使用 Dump operator 查看脚本的输出,如下所示。
You can see the output of the script using the Dump operator as shown below.
grunt> Dump;
(1,Rajiv,Hyderabad)
(2,siddarth,Kolkata)
(3,Rajesh,Delhi)
Note − exec 和 run 命令之间的区别在于,如果我们使用 run ,脚本中的语句将在命令历史记录中可用。
Note − The difference between exec and the run command is that if we use run, the statements from the script are available in the command history.