Apache Presto 简明教程
Apache Presto - Overview
数据分析是对原始数据进行分析以收集相关信息、做出更好决策的过程。它主要用于许多组织中以做出业务决策。大数据分析包含海量数据,且此过程非常复杂,因此公司采用了不同策略。
Data analytics is the process of analyzing raw data to gather relevant information for better decision making. It is primarily used in many organizations to make business decisions. Well, big data analytics involves a large amount of data and this process is quite complex, hence companies use different strategies.
例如,Facebook 是全球领先的数据驱动和最大的数据仓库公司之一。Facebook 仓库数据存储在 Hadoop 中用于大规模计算。后来,当仓库数据增长到 PB 级别时,他们决定开发一个具有低延迟的新系统。在 2012 年,Facebook 团队成员设计了 “Presto” ,针对交互式查询分析而设计,即使具有 PB 级数据,也能快速运行。
For example, Facebook is one of the leading data driven and largest data warehouse company in the world. Facebook warehouse data is stored in Hadoop for large scale computation. Later, when warehouse data grew to petabytes, they decided to develop a new system with low latency. In the year of 2012, Facebook team members designed “Presto” for interactive query analytics that would operate quickly even with petabytes of data.
What is Apache Presto?
Apache Presto 是一个分布式并行查询执行引擎,针对低延迟和交互式查询分析而优化。Presto 可以轻松运行查询,并在从 GB 到 PB 的范围内进行扩展,且不会出现宕机。
Apache Presto is a distributed parallel query execution engine, optimized for low latency and interactive query analysis. Presto runs queries easily and scales without down time even from gigabytes to petabytes.
一个 Presto 查询可以处理来自 HDFS、MySQL、Cassandra、Hive 等多个来源的数据,还可以处理许多其他数据源。Presto 是用 Java 编写的,且易于与其他数据基础设施组件集成。Presto 功能强大,Airbnb、DropBox、Groupon、Netflix 等领先公司都在采用它。
A single Presto query can process data from multiple sources like HDFS, MySQL, Cassandra, Hive and many more data sources. Presto is built in Java and easy to integrate with other data infrastructure components. Presto is powerful, and leading companies like Airbnb, DropBox, Groupon, Netflix are adopting it.
Presto − Features
Presto 包含以下功能:
Presto contains the following features −
-
Simple and extensible architecture.
-
Pluggable connectors - Presto supports pluggable connector to provide metadata and data for queries.
-
Pipelined executions - Avoids unnecessary I/O latency overhead.
-
User-defined functions - Analysts can create custom user-defined functions to migrate easily.
-
Vectorized columnar processing.
Presto − Benefits
以下是 Apache Presto 提供的好处:
Here is a list of benefits that Apache Presto offers −
-
Specialized SQL operations
-
Easy to install and debug
-
Simple storage abstraction
-
Quickly scales petabytes data with low latency
Presto − Applications
Presto 支持当今大多数最佳行业应用。让我们来看一下一些著名应用。
Presto supports most of today’s best industrial applications. Let’s take a look at some of the notable applications.
-
Facebook − Facebook built Presto for data analytics needs. Presto easily scales large velocity of data.
-
Teradata − Teradata provides end-to-end solutions in Big Data analytics and data warehousing. Teradata contribution to Presto makes it easier for more companies to enable all analytical needs.
-
Airbnb − Presto is an integral part of the Airbnb data infrastructure. Well, hundreds of employees are running queries each day with the technology.
Why Presto?
Presto 支持标准 ANSI SQL,这让数据分析师和开发人员的工作变得非常轻松。尽管它以 Java 编写,但它避免了 Java 代码与内存分配和垃圾回收相关的典型问题。Presto 具有与 Hadoop 友好的连接器架构。它允许轻松插入文件系统。
Presto supports standard ANSI SQL which has made it very easy for data analysts and developers. Though it is built in Java, it avoids typical issues of Java code related to memory allocation and garbage collection. Presto has a connector architecture that is Hadoop friendly. It allows to easily plug in file systems.
Presto 在多个 Hadoop 分布上运行。此外,Presto 可以通过一个 Hadoop 平台查询 Cassandra、关系型数据库或其他数据存储。这种跨平台分析功能允许 Presto 用户从千兆字节到拍字节的数据中提取最大业务价值。
Presto runs on multiple Hadoop distributions. In addition, Presto can reach out from a Hadoop platform to query Cassandra, relational databases, or other data stores. This cross-platform analytic capability allows Presto users to extract maximum business value from gigabytes to petabytes of data.
Apache Presto - Architecture
Presto 的架构几乎类似于经典 MPP(海量并行处理)DBMS 架构。下图说明了 Presto 的架构。
The architecture of Presto is almost similar to classic MPP (massively parallel processing) DBMS architecture. The following diagram illustrates the architecture of Presto.
data:image/s3,"s3://crabby-images/cb33c/cb33c11aa5d2c38cce0e53b0b2b7b0191ddb725e" alt="presto architecture"
上述图表包含不同的组件。下表详细描述了每个组件。
The above diagram consists of different components. Following table describes each of the component in detail.
S.No |
Component & Description |
1. |
Client Client (Presto CLI) submits SQL statements to a coordinator to get the result. |
2. |
Coordinator Coordinator is a master daemon. The coordinator initially parses the SQL queries then analyzes and plans for the query execution. Scheduler performs pipeline execution, assigns work to the closest node and monitors progress. |
3. |
Connector Storage plugins are called as connectors. Hive, HBase, MySQL, Cassandra and many more act as a connector; otherwise you can also implement a custom one. The connector provides metadata and data for queries. The coordinator uses the connector to get metadata for building a query plan. |
4. |
Worker The coordinator assigns task to worker nodes. The workers get actual data from the connector. Finally, the worker node delivers result to the client. |
Presto − Workflow
Presto 是一个分布式系统,在集群的节点上运行。Presto 的分布式查询引擎经过优化,适用于交互分析,并支持标准 ANSI SQL,包括复杂查询、聚合、连接和窗口函数。Presto 架构简单而可扩展。Presto 客户端(CLI)向一个主守护进程协调器提交 SQL 语句。
Presto is a distributed system that runs on a cluster of nodes. Presto’s distributed query engine is optimized for interactive analysis and supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions. Presto architecture is simple and extensible. Presto client (CLI) submits SQL statements to a master daemon coordinator.
调度器通过执行管道连接。调度器会将工作分配给最靠近数据且可以监控进度的节点。协调器将任务分配给多个工作节点,最终工作节点将结果返回给客户端。客户端从输出进程获取数据。可扩展性是一项主要的设计要点。像 Hive、HBase、MySQL 等可插入连接器可为查询提供元数据和数据。Presto 采用“简单存储抽象”进行设计,使针对不同类型数据源提供 SQL 查询功能变得容易。
The scheduler connects through execution pipeline. The scheduler assigns work to nodes which is closest to the data and monitors progress. The coordinator assigns task to multiple worker nodes and finally the worker node delivers the result back to the client. The client pulls data from the output process. Extensibility is the key design. Pluggable connectors like Hive, HBase, MySQL, etc., provides metadata and data for queries. Presto was designed with a “simple storage abstraction” that makes it easy to provide SQL query capability against these different kind of data sources.
Execution Model
Presto 利用专为支持 SQL 语义而设计的操作员,支持自定义查询和执行引擎。除了改进调度外,所有处理工作都在内存中执行,并在不同阶段通过网络进行管道传输。这避免了不必要的 I/O 延迟开销。
Presto supports custom query and execution engine with operators designed to support SQL semantics. In addition to improved scheduling, all processing is in memory and pipelined across the network between different stages. This avoids unnecessary I/O latency overhead.
Apache Presto - Installation
本章将解释如何在您的机器上安装 Presto。让我们了解 Presto 的基本要求,
This chapter will explain how to install Presto on your machine. Let’s go through the basic requirements of Presto,
-
Linux or Mac OS
-
Java version 8
现在,让我们继续执行以下步骤在您的机器上安装 Presto。
Now, let’s continue the following steps to install Presto on your machine.
Verifying Java installation
希望您现在已在您的机器上安装了 Java 8,您只需使用以下命令对其进行验证。
Hopefully, you have already installed Java version 8 on your machine right now, so you just verify it using the following command.
$ java -version
如果 Java 已成功安装在您的机器上,您将看到已安装的 Java 版本。如果未安装 Java,请按照后续步骤在您的机器上安装 Java 8。
If Java is successfully installed on your machine, you could see the version of installed Java. If Java is not installed, follow the subsequent steps to install Java 8 on your machine.
下载 JDK。访问以下链接下载最新版本的 JDK。
Download JDK. Download the latest version of JDK by visiting the following link.
最新版本是 JDK 8u 92,文件为 “jdk-8u92-linux-x64.tar.gz”。请在您的机器上下载该文件。
The latest version is JDK 8u 92 and the file is “jdk-8u92-linux-x64.tar.gz”. Please download the file on your machine.
然后,解压文件并移动到指定目录。
After that, extract the files and move to the specific directory.
然后设置 Java 替代项。最终,Java 将安装在您的机器上。
Then set Java alternatives. Finally Java will be installed on your machine.
Apache Presto Installation
访问以下链接下载最新版本的 Presto,
Download the latest version of Presto by visiting the following link,
现在,“presto-server-0.149.tar.gz” 的最新版本将下载到您的机器上。
Now the latest version of “presto-server-0.149.tar.gz” will be downloaded on your machine.
Configuration Settings
Create “data” directory
在安装目录之外创建一个数据目录,该目录将用于存储日志、元数据等,以便在升级 Presto 时可以轻松保存。可使用以下代码定义该目录 −
Create a data directory outside the installation directory, which will be used for storing logs, metadata, etc., so that it is to be easily preserved when upgrading Presto. It is defined using the following code −
$ cd
$ mkdir data
要查看它的位置路径,请使用命令“pwd”。此位置将在次の节点配置文件中分配。
To view the path where it is located, use the command “pwd”. This location will be assigned in the next node.properties file.
Create “etc” directory
使用以下代码在 Presto 安装目录内创建一个等目录 −
Create an etc directory inside Presto installation directory using the following code −
$ cd presto-server-0.149
$ mkdir etc
该目录将保存配置文件。让我们逐个创建每个文件。
This directory will hold configuration files. Let’s create each file one by one.
Node Properties
Presto 节点配置文件包含特定于每个节点的环境配置。使用以下代码在等目录 (etc/node.properties) 中创建该文件 −
Presto node properties file contains environmental configuration specific to each node. It is created inside etc directory (etc/node.properties) using the following code −
$ cd etc
$ vi node.properties
node.environment = production
node.id = ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir = /Users/../workspace/Presto
完成所有更改后,保存文件并退出终端。其中 node.data 是上述创建的 data 目录的位置路径。 node.id 表示每个节点的唯一标识符。
After making all the changes, save the file, and quit the terminal. Here node.data is the location path of the above created data directory. node.id represents the unique identifier for each node.
JVM Config
在 etc 目录中创建一个文件 “jvm.config”(etc/jvm.config)。此文件包含用于启动 Java 虚拟机的命令行选项列表。
Create a file “jvm.config” inside etc directory (etc/jvm.config). This file contains a list of command line options used for launching the Java Virtual Machine.
$ cd etc
$ vi jvm.config
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize = 32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError = kill -9 %p
完成所有更改后,保存文件并退出终端。
After making all the changes, save the file, and quit the terminal.
Config Properties
在 etc 目录中创建一个文件 “config.properties”(etc/config.properties)。此文件包含 Presto 服务器的配置。如果设置一台机器用于测试,Presto 服务器只能充当使用以下代码定义的协调进程 −
Create a file “config.properties” inside etc directory(etc/config.properties). This file contains the configuration of Presto server. If you are setting up a single machine for testing, Presto server can function only as the coordination process as defined using the following code −
$ cd etc
$ vi config.properties
coordinator = true
node-scheduler.include-coordinator = true
http-server.http.port = 8080
query.max-memory = 5GB
query.max-memory-per-node = 1GB
discovery-server.enabled = true
discovery.uri = http://localhost:8080
在此,
Here,
-
coordinator − master node.
-
node-scheduler.include-coordinator − Allows scheduling work on the coordinator.
-
http-server.http.port − Specifies the port for the HTTP server.
-
query.max-memory=5GB − The maximum amount of distributed memory.
-
query.max-memory-per-node=1GB − The maximum amount of memory per node.
-
discovery-server.enabled − Presto uses the Discovery service to find all the nodes in the cluster.
-
discovery.uri − he URI to the Discovery server.
如果设置多台机器 Presto 服务器,Presto 将充当协调进程和工作进程。使用此配置设置在多台机器上测试 Presto 服务器。
If you are setting up multiple machine Presto server, Presto will function as both coordination and worker process. Use this configuration setting to test Presto server on multiple machines.
Configuration for Coordinator
$ cd etc
$ vi config.properties
coordinator = true
node-scheduler.include-coordinator = false
http-server.http.port = 8080
query.max-memory = 50GB
query.max-memory-per-node = 1GB
discovery-server.enabled = true
discovery.uri = http://localhost:8080
Configuration for Worker
$ cd etc
$ vi config.properties
coordinator = false
http-server.http.port = 8080
query.max-memory = 50GB
query.max-memory-per-node = 1GB
discovery.uri = http://localhost:8080
Log Properties
在 etc 目录中创建一个文件 “log.properties”(etc/log.properties)。此文件包含命名日志记录程序层次结构的最低日志级别。它使用以下代码定义 −
Create a file “log.properties” inside etc directory(etc/log.properties). This file contains minimum log level for named logger hierarchies. It is defined using the following code −
$ cd etc
$ vi log.properties
com.facebook.presto = INFO
保存文件并退出终端。在此,使用了 DEBUG、INFO、WARN 和 ERROR 四个日志级别。默认日志级别为 INFO。
Save the file and quit the terminal. Here, four log levels are used such as DEBUG, INFO, WARN and ERROR. Default log level is INFO.
Catalog Properties
在 etc 目录中创建一个目录 “catalog”(etc/catalog)。这将用于装载数据。例如,使用以下内容创建 etc/catalog/jmx.properties ,以将 jmx connector 装载为 jmx 目录 −
Create a directory “catalog” inside etc directory(etc/catalog). This will be used for mounting data. For example, create etc/catalog/jmx.properties with the following contents to mount the jmx connector as the jmx catalog −
$ cd etc
$ mkdir catalog
$ cd catalog
$ vi jmx.properties
connector.name = jmx
Start Presto
可以使用以下命令启动 Presto,
Presto can be started using the following command,
$ bin/launcher start
然后,您将看到类似于此的响应,
Then you will see the response similar to this,
Started as 840
Run Presto
若要启动 Presto 服务器,请使用以下命令 −
To launch Presto server, use the following command −
$ bin/launcher run
在成功启动 Presto 服务器后,您可在 “var/log” 目录中找到日志文件。
After successfully launching Presto server, you can find log files in “var/log” directory.
-
launcher.log − This log is created by the launcher and is connected to the stdout and stderr streams of the server.
-
server.log − This is the main log file used by Presto.
-
http-request.log − HTTP request received by the server.
到目前为止,您已在自己的计算机上成功安装了 Presto 配置设置。让我们继续执行步骤来安装 Presto CLI。
As of now, you have successfully installed Presto configuration settings on your machine. Let’s continue the steps to install Presto CLI.
Install Presto CLI
Presto CLI 提供了基于终端的交互式 shell,用于运行查询。
The Presto CLI provides a terminal-based interactive shell for running queries.
通过访问以下链接下载 Presto CLI,
Download the Presto CLI by visiting the following link,
现在,您的计算机上将安装“presto-cli-0.149-executable.jar”。
Now “presto-cli-0.149-executable.jar” will be installed on your machine.
Run CLI
在下载 presto-cli 后,将其复制到要从中运行它的位置。此位置可以是具有到协调器网络访问权限的任何节点。首先将 Jar 文件的名称更改为 Presto。然后使用以下代码通过 chmod + x 命令使其可执行 −
After downloading the presto-cli, copy it to the location which you want to run it from. This location may be any node that has network access to the coordinator. First change the name of the Jar file to Presto. Then make it executable with chmod + x command using the following code −
$ mv presto-cli-0.149-executable.jar presto
$ chmod +x presto
现在使用以下命令执行 CLI,
Now execute CLI using the following command,
./presto --server localhost:8080 --catalog jmx --schema default
Here jmx(Java Management Extension) refers to catalog and default referes to schema.
您将看到以下响应,
You will see the following response,
presto:default>
现在,在您的终端上键入“jps”命令,您将看到正在运行的守护程序。
Now type “jps” command on your terminal and you will see the running daemons.
Apache Presto - Configuration Settings
本章将讨论 Presto 的配置设置。
This chapter will discuss the configuration settings for Presto.
Presto Verifier
Presto Verifier 可用于将 Presto 与另一个数据库(如 MySQL)进行测试,或将两个 Presto 集群彼此进行测试。
The Presto Verifier can be used to test Presto against another database (such as MySQL), or to test two Presto clusters against each other.
Create Database in MySQL
打开 MySQL 服务器并使用以下命令创建一个数据库。
Open MySQL server and create a database using the following command.
create database test
现在你在服务器中创建了“test”数据库。创建表并使用以下查询加载它。
Now you have created “test” database in the server. Create the table and load it with the following query.
CREATE TABLE verifier_queries(
id INT NOT NULL AUTO_INCREMENT,
suite VARCHAR(256) NOT NULL,
name VARCHAR(256),
test_catalog VARCHAR(256) NOT NULL,
test_schema VARCHAR(256) NOT NULL,
test_prequeries TEXT,
test_query TEXT NOT NULL,
test_postqueries TEXT,
test_username VARCHAR(256) NOT NULL default 'verifier-test',
test_password VARCHAR(256),
control_catalog VARCHAR(256) NOT NULL,
control_schema VARCHAR(256) NOT NULL,
control_prequeries TEXT,
control_query TEXT NOT NULL,
control_postqueries TEXT,
control_username VARCHAR(256) NOT NULL default 'verifier-test',
control_password VARCHAR(256),
session_properties_json TEXT,
PRIMARY KEY (id)
);
Add Config Settings
创建一个属性文件以配置验证器-
Create a properties file to configure the verifier −
$ vi config.properties
suite = mysuite
query-database = jdbc:mysql://localhost:3306/tutorials?user=root&password=pwd
control.gateway = jdbc:presto://localhost:8080
test.gateway = jdbc:presto://localhost:8080
thread-count = 1
这里,在 query-database 字段中,输入以下内容:mysql 数据库名称、用户名和密码。
Here, in the query-database field, enter the following details − mysql database name, user name, and password.
Download JAR File
通过访问以下链接下载 Presto 验证器 jar 文件,
Download Presto-verifier jar file by visiting the following link,
现在已在你的机器上下载版本 “presto-verifier-0.149-executable.jar” 。
Now the version “presto-verifier-0.149-executable.jar” is downloaded on your machine.
Execute JAR
使用以下命令执行 JAR 文件,
Execute the JAR file using the following command,
$ mv presto-verifier-0.149-executable.jar verifier
$ chmod+x verifier
Run Verifier
使用以下命令运行验证器,
Run the verifier using the following command,
$ ./verifier config.properties
Create Table
我们使用以下查询在 “test” 数据库中创建一个简单表。
Let’s create a simple table in “test” database using the following query.
create table product(id int not null, name varchar(50))
Insert Table
创建表后,使用以下查询插入两条记录,
After creating a table, insert two records using the following query,
insert into product values(1,’Phone')
insert into product values(2,’Television’)
Run Verifier Query
在验证器终端(./verifier config.propeties)中执行以下示例查询,以检查验证器结果。
Execute the following sample query in the verifier terminal (./verifier config.propeties) to check the verifier result.
Sample Query
insert into verifier_queries (suite, test_catalog, test_schema, test_query,
control_catalog, control_schema, control_query) values
('mysuite', 'mysql', 'default', 'select * from mysql.test.product',
'mysql', 'default', 'select * from mysql.test.product');
这里, select * from mysql.test.product 查询引用 mysql 目录, test 是数据库名称, product 是表名称。通过这种方式,你可以使用 Presto 服务器访问 mysql 连接器。
Here, select * from mysql.test.product query refers to mysql catalog, test is database name and product is table name. In this way, you can access mysql connector using Presto server.
这里,两个相同的 select 查询彼此进行测试,以查看性能。类似地,你可以运行其他查询来测试性能结果。你还可以连接两个 Presto 集群来检查性能结果。
Here, two same select queries are tested against each other to see the performance. Similarly, you can run other queries to test the performance results. You can also connect two Presto clusters to check the performance results.
Apache Presto - Administration Tools
在本章中,我们将讨论 Presto 中使用的管理工具。让我们从 Presto 的 Web 界面开始。
In this chapter, we will discuss the administration tools used in Presto. Let’s start with the Web Interface of Presto.
Web Interface
Presto 提供了一个用于监控和管理查询的 Web 界面。可以从协调器 Config 属性中指定的端口号访问它。
Presto provides a web interface for monitoring and managing queries. It can be accessed from the port number specified in the coordinator Config Properties.
启动 Presto 服务器和 Presto CLI。然后,您可以从以下 url 访问 Web 界面 − http://localhost:8080/
Start Presto server and Presto CLI. Then you can access the web interface from the following url − http://localhost:8080/
data:image/s3,"s3://crabby-images/acbd1/acbd153cb49e618abab6548afbddefc955832f49" alt="web interface"
输出将与上述屏幕相似。
The output will be similar to the above screen.
在此,主页具有查询列表以及以下信息:唯一查询 ID、查询文本、查询状态、完成百分比、用户名和此查询的来源。最新的查询首先运行,然后在底部显示已完成或未完成的查询。
Here, the main page has a list of queries along with information like unique query ID, query text, query state, percentage completed, username and source from which this query is originated. Latest queries are running first, then completed or not completed queries are displayed at the bottom.
Tuning the Performance on Presto
如果 Presto 集群有任何与性能相关的问题,请将您的默认配置设置更改为以下设置。
If Presto cluster is having any performance-related issues, change your default configuration settings to the following settings.
Config Properties
-
task. info -refresh-max-wait − Reduces coordinator work load.
-
task.max-worker-threads − Splits the process and assigns to each worker nodes.
-
distributed-joins-enabled − Hash-based distributed joins.
-
node-scheduler.network-topology − Sets network topology to scheduler.
JVM Settings
将默认 JVM 设置更改为以下设置。这有助于诊断垃圾回收问题。
Change your default JVM settings to the following settings. This will be helpful for diagnosing garbage collection issues.
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCCause
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintReferenceGC
-XX:+PrintClassHistogramAfterFullGC
-XX:+PrintClassHistogramBeforeFullGC
-XX:PrintFLSStatistics = 2
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount = 1
Apache Presto - Basic SQL Operations
在本章中,我们将讨论如何在 Presto 上创建并执行查询。我们来了解一下 Presto 支持的基本数据类型。
In this chapter, we will discuss how to create and execute queries on Presto. Let us go through Presto supported basic data types.
Basic Data Types
下表描述了 Presto 的基本数据类型。
The following table describes the basic data types of Presto.
S.No |
Data type & Description |
1. |
VARCHAR Variable length character data |
2. |
BIGINT A 64-bit signed integer |
3. |
DOUBLE A 64-bit floating point double precision value |
4. |
DECIMAL A fixed precision decimal number. For example DECIMAL(10,3) - 10 is precision, i.e. total number of digits and 3 is scale value represented as fractional point. Scale is optional and default value is 0 |
5. |
BOOLEAN Boolean values true and false |
6. |
VARBINARY Variable length binary data |
7. |
JSON JSON data |
8. |
DATE Date data type represented as year-month-day |
9. |
TIME, TIMESTAMP, TIMESTAMP with TIME ZONE TIME - Time of the day (hour-min-sec-millisecond) TIMESTAMP - Date and time of the day TIMESTAMP with TIME ZONE - Date and time of the day with time zone from the value |
10. |
INTERVAL Stretch or extend date and time data types |
11. |
ARRAY Array of the given component type. For example, ARRAY[5,7] |
12. |
MAP Map between the given component types. For example, MAP(ARRAY[‘one’,’two’],ARRAY[5,7]) |
13. |
ROW Row structure made up of named fields |
Presto − Operators
Presto 运算符列在以下表中。
Presto operators are listed in the following table.
S.No |
Operator & Description |
1. |
Arithmetic operatorPresto supports arithmetic operators such as +, -, *, /, % |
2. |
Relational operator<,>,⇐,>=,=,<> |
3. |
Logical operatorAND, OR, NOT |
4. |
Range operatorRange operator is used to test the value in a specific range. Presto supports BETWEEN, IS NULL, IS NOT NULL, GREATEST and LEAST |
5. |
Decimal operatorBinary arithmetic decimal operator performs binary arithmetic operation for decimal type Unary decimal operator − The - operator performs negation |
6. |
String operatorThe *‘ |
’ operator* performs string concatenation |
|
7. |
Date and time operatorPerforms arithmetic addition and subtraction operations on date and time data types |
8. |
Array operatorSubscript operator[] - access an element of an array Concatenation operator |
- concatenate an array with an array or an element of the same type |
|
9. |
Map operatorMap subscript operator [] - retrieve the value corresponding to a given key from a map |
Apache Presto - SQL Functions
到目前为止,我们在讨论对 Presto 运行一些基本查询。本章将讨论重要的 SQL 函数。
As of now we were discussing running some simple basic queries on Presto. This chapter will discuss the important SQL functions.
Math Functions
数学函数处理数学公式。下表详细描述了函数列表。
Math functions operate on mathematical formulas. Following table describes the list of functions in detail.
S.No. |
Function & Description |
1. |
abs(x)Returns the absolute value of x |
2. |
cbrt(x)Returns the cube root of x |
3. |
ceiling(x)Returns the x value rounded up to the nearest integer |
4. |
ceil(x) Alias for ceiling(x) |
5. |
degrees(x)Returns the degree value for x |
6. |
e(x)Returns the double value for Euler’s number |
7. |
exp(x) Returns the exponent value for Euler’s number |
8. |
floor(x)Returns x rounded down to the nearest integer |
9. |
from_base(string,radix) Returns the value of string interpreted as a base-radix number |
10. |
ln(x) Returns the natural logarithm of x |
11. |
log2(x)Returns the base 2 logarithm of x |
12. |
log10(x) Returns the base 10 logarithm of x |
13. |
log(x,y) Returns the base y logarithm of x |
14. |
mod(n,m)Returns the modulus (remainder) of n divided by m |
15. |
pi() Returns pi value. The result will be returned as a double value |
16. |
power(x,p)Returns power of value ‘p’ to the x value |
17. |
pow(x,p) Alias for power(x,p) |
18. |
radians(x)converts the angle x in degree radians |
19. |
rand() Alias for radians() |
20. |
random()Returns the pseudo-random value |
21. |
rand(n) Alias for random() |
22. |
round(x)Returns the rounded value for x |
23. |
round(x,d) x value rounded for the ‘d’ decimal places |
24. |
sign(x) Returns the signum function of x, i.e., 0 if the argument is 0 1 if the argument is greater than 0 -1 if the argument is less than 0 For double arguments, the function additionally returns − NaN if the argument is NaN 1 if the argument is +Infinity -1 if the argument is -Infinity |
25. |
sqrt(x)Returns the square root of x |
26. |
to_base(x,radix)Return type is archer. The result is returned as the base radix for x |
27. |
truncate(x)Truncates the value for x |
28. |
width_bucket(x, bound1, bound2, n)Returns the bin number of x specified bound1 and bound2 bounds and n number of buckets |
29. |
width_bucket(x, bins)Returns the bin number of x according to the bins specified by the array bins |
Trigonometric Functions
三角函数参数表示为弧度()。下表列出了这些函数。
Trigonometric functions arguments are represented as radians(). Following table lists out the functions.
S.No |
Functions & Description |
1. |
acos(x)Returns the inverse cosine value(x) |
2. |
asin(x) Returns the inverse sine value(x) |
3. |
atan(x) Returns the inverse tangent value(x) |
4. |
atan2(y,x)Returns the inverse tangent value(y/x) |
5. |
cos(x) Returns the cosine value(x) |
6. |
cosh(x)Returns the hyperbolic cosine value(x) |
7. |
sin(x)Returns the sine value(x) |
8. |
tan(x) Returns the tangent value(x) |
9. |
tanh(x) Returns the hyperbolic tangent value(x) |
Bitwise Functions
下表列出了按位函数。
The following table lists out the Bitwise functions.
S.No |
Functions & Description |
1. |
bit_count(x, bits)Count the number of bits |
2. |
bitwise_and(x,y)Perform bitwise AND operation for two bits, x and y |
3. |
bitwise_or(x,y)Bitwise OR operation between two bits x, y |
4. |
bitwise_not(x)Bitwise Not operation for bit x |
5. |
bitwise_xor(x,y)XOR operation for bits x, y |
String Functions
下表列出了字符串函数。
Following table lists out the String functions.
S.No |
Functions & Description |
1. |
concat(string1, …, stringN)Concatenate the given strings |
2. |
length(string)Returns the length of the given string |
3. |
lower(string)Returns the lowercase format for the string |
4. |
upper(string)Returns the uppercase format for the given string |
5. |
lpad(string, size, padstring)Left padding for the given string |
6. |
ltrim(string)Removes the leading whitespace from the string |
7. |
replace(string, search, replace)Replaces the string value |
8. |
reverse(string)Reverses the operation performed for the string |
9. |
rpad(string, size, padstring)Right padding for the given string |
10. |
rtrim(string)Removes the trailing whitespace from the string |
11. |
split(string, delimiter)Splits the string on delimiter and returns an array of size at the most limit |
12. |
split_part(string, delimiter, index)Splits the string on delimiter and returns the field index |
13. |
strpos(string, substring)Returns the starting position of the substring in the string |
14. |
substr(string, start)Returns the substring for the given string |
15. |
substr(string, start, length)Returns the substring for the given string with the specific length |
16. |
trim(string)Removes the leading and trailing whitespace from the string |
Date and Time Functions
下表列出了日期和时间函数。
Following table lists out the Date and Time functions.
S.No |
Functions & Description |
1. |
current_dateReturns the current date |
2. |
current_timeReturns the current time |
3. |
current_timestampReturns the current timestamp |
4. |
current_timezone()Returns the current timezone |
5. |
now()Returns the current date,timestamp with the timezone |
6. |
localtimeReturns the local time |
7. |
localtimestampReturns the local timestamp |
Regular Expression Functions
下表列出了正则表达式函数。
The following table lists out the Regular Expression functions.
S.No |
Functions & Description |
1. |
regexp_extract_all(string, pattern)Returns the string matched by the regular expression for the pattern |
2. |
regexp_extract_all(string, pattern, group)Returns the string matched by the regular expression for the pattern and the group |
3. |
regexp_extract(string, pattern)Returns the first substring matched by the regular expression for the pattern |
4. |
regexp_extract(string, pattern, group)Returns the first substring matched by the regular expression for the pattern and the group |
5. |
regexp_like(string, pattern)Returns the string matches for the pattern. If the string is returned, the value will be true otherwise false |
6. |
regexp_replace(string, pattern)Replaces the instance of the string matched for the expression with the pattern |
7. |
regexp_replace(string, pattern, replacement)Replace the instance of the string matched for the expression with the pattern and replacement |
8. |
regexp_split(string, pattern)Splits the regular expression for the given pattern |
JSON Functions
下表列出了 JSON 函数。
The following table lists out JSON functions.
S.No |
Functions & Description |
1. |
json_array_contains(json, value)Check the value exists in a json array. If the value exists it will return true, otherwise false |
2. |
json_array_get(json_array, index)Get the element for index in json array |
3. |
json_array_length(json)Returns the length in json array |
4. |
json_format(json)Returns the json structure format |
5. |
json_parse(string)Parses the string as a json |
6. |
json_size(json, json_path)Returns the size of the value |
URL Functions
下表列出了 URL 函数。
The following table lists out the URL functions.
S.No |
Functions & Description |
1. |
url_extract_host(url)Returns the URL’s host |
2. |
url_extract_path(url)Returns the URL’s path |
3. |
url_extract_port(url)Returns the URL’s port |
4. |
url_extract_protocol(url)Returns the URL’s protocol |
5. |
url_extract_query(url)Returns the URL’s query string |
Aggregate Functions
下表列出了汇总函数。
The following table lists out the Aggregate functions.
S.No |
Functions & Description |
1. |
avg(x) Returns average for the given value |
2. |
min(x,n)Returns the minimum value from two values |
3. |
max(x,n)Returns the maximum value from two values |
4. |
sum(x)Returns the sum of value |
5. |
count(*)Returns the number of input rows |
6. |
count(x)Returns the count of input values |
7. |
checksum(x)Returns the checksum for x |
8. |
arbitrary(x)Returns the arbitrary value for x |
Color Functions
下表列出了颜色函数。
Following table lists out the Color functions.
S.No |
Functions & Description |
1. |
bar(x, width)Renders a single bar using rgb low_color and high_color |
2. |
bar(x, width, low_color, high_color)Renders a single bar for the specified width |
3. |
color(string)Returns the color value for the entered string |
4. |
render(x, color)Renders value x using the specific color using ANSI color codes |
5. |
render(b)Accepts boolean value b and renders a green true or a red false using ANSI color codes |
6. |
rgb(red, green, blue) Returns a color value capturing the RGB value of three component color values supplied as int parameters ranging from 0 to 255 |
Array Functions
下表列出了数组函数。
The following table lists out the Array functions.
S.No |
Functions & Description |
1. |
array_max(x)Finds the max element in an array |
2. |
array_min(x)Finds the min element in an array |
3. |
array_sort(x)Sorts the elements in an array |
4. |
array_remove(x,element)Removes the specific element from an array |
5. |
concat(x,y)Concatenates two arrays |
6. |
contains(x,element)Finds the given elements in an array. True will be returned if it is present, otherwise false |
7. |
array_position(x,element)Find the position of the given element in an array |
8. |
array_intersect(x,y)Performs an intersection between two arrays |
9. |
element_at(array,index)Returns the array element position |
10. |
slice(x,start,length)Slices the array elements with the specific length |
Teradata Functions
下表列出了 Teradata 函数。
The following table lists out Teradata functions.
S.No |
Functions & Description |
1. |
index(string,substring)Returns the index of the string with the given substring |
2. |
substring(string,start)Returns the substring of the given string. You can specify the start index here |
3. |
substring(string,start,length)Returns the substring of the given string for the specific start index and length of the string |
Apache Presto - MySQL Connector
MySQL 连接器用于查询外部 MySQL 数据库。
The MySQL connector is used to query an external MySQL database.
Configuration Settings
希望您已经在您的机器上安装了 MySQL 服务器。要使 Presto 服务器启用 MySQL 属性,您必须在 “etc/catalog” 目录中创建一个文件 “mysql.properties” 。发出以下命令以创建一个 mysql.properties 文件。
Hopefully you have installed mysql server on your machine. To enable mysql properties on Presto server, you must create a file “mysql.properties” in “etc/catalog” directory. Issue the following command to create a mysql.properties file.
$ cd etc
$ cd catalog
$ vi mysql.properties
connector.name = mysql
connection-url = jdbc:mysql://localhost:3306
connection-user = root
connection-password = pwd
保存文件并退出终端。在上述文件中,您必须在 connection-password 字段中输入 MySQL 密码。
Save the file and quit the terminal. In the above file, you must enter your mysql password in connection-password field.
Create Database in MySQL Server
打开 MySQL 服务器并使用以下命令创建一个数据库。
Open MySQL server and create a database using the following command.
create database tutorials
现在您已经在服务器中创建了“tutorials”数据库。要启用数据库类型,请在查询窗口中使用命令“use tutorials”。
Now you have created “tutorials” database in the server. To enable database type, use the command “use tutorials” in the query window.
Create Table
我们来在“tutorials”数据库中创建一个简单的表。
Let’s create a simple table on “tutorials” database.
create table author(auth_id int not null, auth_name varchar(50),topic varchar(100))
Insert Table
创建表后,使用以下查询插入三条记录。
After creating a table, insert three records using the following query.
insert into author values(1,'Doug Cutting','Hadoop')
insert into author values(2,’James Gosling','java')
insert into author values(3,'Dennis Ritchie’,'C')
Connect Presto CLI
键入以下命令以在 Presto CLI 上连接 MySql 插件。
Type the following command to connect MySql plugin on Presto CLI.
./presto --server localhost:8080 --catalog mysql --schema tutorials
您将收到以下应答。
You will receive the following response.
presto:tutorials>
此处 “tutorials” 指的是 mysql 服务器中的架构。
Here “tutorials” refers to schema in mysql server.
List Schemas
要列出 mysql 中的所有架构,请在 Presto 服务器中键入以下查询。
To list out all the schemas in mysql, type the following query in Presto server.
List Tables from Schema
以下查询列出教程架构中的所有表。
Following query lists out all the tables in tutorials schema.
Access Table Records
要从 mysql 表获取所有记录,请发出以下查询。
To fetch all the records from mysql table, issue the following query.
Create Table Using as Command
Mysql 连接器不支持创建表查询,但可以使用 as 命令创建表。
Mysql connector doesn’t support create table query but you can create a table using as command.
Query
presto:tutorials> create table mysql.tutorials.sample as
select * from mysql.tutorials.author;
Result
CREATE TABLE: 3 rows
您无法直接插入行,因为此连接器有一些限制。它不支持以下查询:
You can’t insert rows directly because this connector has some limitations. It cannot support the following queries −
-
create
-
insert
-
update
-
delete
-
drop
要查看新创建表中的记录,请键入以下查询。
To view the records in the newly created table, type the following query.
Apache Presto - JMX Connector
Java Management Extensions (JMX) 提供有关 Java 虚拟机和在 JVM 内部运行的软件的信息。该 JMX 连接器用于在 Presto 服务器中查询 JMX 信息。
Java Management Extensions (JMX) gives information about the Java Virtual Machine and software running inside JVM. The JMX connector is used to query JMX information in Presto server.
由于我们已经启用了 “jmx.properties” 目录下的 “etc/catalog” 文件。现连接 Prest CLI 来启用 JMX 插件。
As we have already enabled “jmx.properties” file under “etc/catalog” directory. Now connect Prest CLI to enable JMX plugin.
JMX Schema
要列出 “jmx” 中的所有架构,请键入以下查询。
To list out all the schemas in “jmx”, type the following query.
Show Tables
要查看 “current” 架构中的表,请使用以下命令。
To view the tables in the “current” schema, use the following command.
Result
Table
------------------------------------------------------------------------------
com.facebook.presto.execution.scheduler:name = nodescheduler
com.facebook.presto.execution:name = queryexecution
com.facebook.presto.execution:name = querymanager
com.facebook.presto.execution:name = remotetaskfactory
com.facebook.presto.execution:name = taskexecutor
com.facebook.presto.execution:name = taskmanager
com.facebook.presto.execution:type = queryqueue,name = global,expansion = global
………………
……………….
Result
node | compilationtimemonitoringsupported | name | objectname | totalcompilationti
--------------------------------------+------------------------------------+--------------------------------+----------------------------+-------------------
ffffffff-ffff-ffff-ffff-ffffffffffff | true | HotSpot 64-Bit Tiered Compilers | java.lang:type=Compilation | 1276
Result
node | readfromoutputbuffertime.alltime.count
| readfromoutputbuffertime.alltime.max | readfromoutputbuffertime.alltime.maxer
--------------------------------------+---------------------------------------+--------------------------------------+---------------------------------------
ffffffff-ffff-ffff-ffff-ffffffffffff | 92.0 | 1.009106149 |
Apache Presto - HIVE Connector
Hive 连接器允许查询存储在 Hive 数据仓库中的数据。
The Hive connector allows querying data stored in a Hive data warehouse.
Prerequisites
-
Hadoop
-
Hive
希望你在你的机器上安装了 Hadoop 和 Hive。在新终端中逐个启动所有服务。然后,使用以下命令启动 hive 元数据存储,
Hopefully you have installed Hadoop and Hive on your machine. Start all the services one by one in the new terminal. Then, start hive metastore using the following command,
hive --service metastore
Presto 使用 Hive 元数据存储服务来获取 Hive 表的详细信息。
Presto uses Hive metastore service to get the hive table’s details.
Configuration Settings
在 “etc/catalog” 目录下创建一个文件 “hive.properties” 。使用以下命令。
Create a file “hive.properties” under “etc/catalog” directory. Use the following command.
$ cd etc
$ cd catalog
$ vi hive.properties
connector.name = hive-cdh4
hive.metastore.uri = thrift://localhost:9083
完成所有更改后,保存文件并退出终端。
After making all the changes, save the file and quit the terminal.
Create Table
创建表是用于在 Hive 中创建表的语句。例如,使用以下查询。
Create Table is a statement used to create a table in Hive. For example, use the following query.
hive> create table author(auth_id int, auth_name varchar(50),
topic varchar(100) STORED AS SEQUENCEFILE;
Insert Table
以下查询可用于向 hive 表中插入记录。
Following query is used to insert records in hive’s table.
hive> insert into table author values (1,’ Doug Cutting’,Hadoop),
(2,’ James Gosling’,java),(3,’ Dennis Ritchie’,C);
Start Presto CLI
你可以启动 Presto CLI 以使用以下命令连接 Hive 存储插件。
You can start Presto CLI to connect Hive storage plugin using the following command.
$ ./presto --server localhost:8080 --catalog hive —schema tutorials;
您将收到以下应答。
You will receive the following response.
presto:tutorials >
List Schemas
要列出 Hive 连接器中的所有模式,请键入以下命令。
To list out all the schemas in Hive connector, type the following command.
List Tables
要列出“教程”模式中的所有表,请使用以下查询。
To list out all the tables in “tutorials” schema, use the following query.
Apache Presto - KAFKA Connector
Presto 的 Kafka 连接器允许使用 Presto 访问 Apache Kafka 中的数据。
The Kafka Connector for Presto allows to access data from Apache Kafka using Presto.
Prerequisites
下载并安装最新版本的以下 Apache 项目。
Download and install the latest version of the following Apache projects.
-
Apache ZooKeeper
-
Apache Kafka
Start ZooKeeper
使用以下命令启动 ZooKeeper 服务器。
Start ZooKeeper server using the following command.
$ bin/zookeeper-server-start.sh config/zookeeper.properties
现在,ZooKeeper 在端口 2181 上启动。
Now, ZooKeeper starts port on 2181.
Start Kafka
在另一个终端中使用以下命令启动 Kafka。
Start Kafka in another terminal using the following command.
$ bin/kafka-server-start.sh config/server.properties
Kafka 启动后,它使用端口号 9092。
After kafka starts, it uses the port number 9092.
TPCH Data
Download tpch-kafka
$ curl -o kafka-tpch
https://repo1.maven.org/maven2/de/softwareforge/kafka_tpch_0811/1.0/kafka_tpch_
0811-1.0.sh
现在你已经使用上述命令从 Maven 中心下载了加载器。你将获得如下的类似响应。
Now you have downloaded the loader from Maven central using the above command. You will get a similar response as the following.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
5 21.6M 5 1279k 0 0 83898 0 0:04:30 0:00:15 0:04:15 129k
6 21.6M 6 1407k 0 0 86656 0 0:04:21 0:00:16 0:04:05 131k
24 21.6M 24 5439k 0 0 124k 0 0:02:57 0:00:43 0:02:14 175k
24 21.6M 24 5439k 0 0 124k 0 0:02:58 0:00:43 0:02:15 160k
25 21.6M 25 5736k 0 0 128k 0 0:02:52 0:00:44 0:02:08 181k
………………………..
然后,使用以下命令使其可执行,
Then, make it executable using the following command,
$ chmod 755 kafka-tpch
Run tpch-kafka
使用以下命令运行 kafka-tpch 程序来预加载许多带有 tpch 数据的主题。
Run the kafka-tpch program to preload a number of topics with tpch data using the following command.
Result
2016-07-13T16:15:52.083+0530 INFO main io.airlift.log.Logging Logging
to stderr
2016-07-13T16:15:52.124+0530 INFO main de.softwareforge.kafka.LoadCommand
Processing tables: [customer, orders, lineitem, part, partsupp, supplier,
nation, region]
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-1
de.softwareforge.kafka.LoadCommand Loading table 'customer' into topic 'tpch.customer'...
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-2
de.softwareforge.kafka.LoadCommand Loading table 'orders' into topic 'tpch.orders'...
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-3
de.softwareforge.kafka.LoadCommand Loading table 'lineitem' into topic 'tpch.lineitem'...
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-4
de.softwareforge.kafka.LoadCommand Loading table 'part' into topic 'tpch.part'...
………………………
……………………….
现在,使用 tpch 加载了 Kafka 表 customers、orders、supplier 等。
Now, Kafka tables customers,orders,supplier, etc., are loaded using tpch.
Add Config Settings
我们来在 Presto 服务器上添加以下 Kafka 连接器配置设置。
Let’s add the following Kafka connector configuration settings on Presto server.
connector.name = kafka
kafka.nodes = localhost:9092
kafka.table-names = tpch.customer,tpch.orders,tpch.lineitem,tpch.part,tpch.partsupp,
tpch.supplier,tpch.nation,tpch.region
kafka.hide-internal-columns = false
在上述配置中,使用 Kafka-tpch 程序加载 Kafka 表。
In the above configuration, Kafka tables are loaded using Kafka-tpch program.
Start Presto CLI
使用以下命令启动 Presto CLI,
Start Presto CLI using the following command,
$ ./presto --server localhost:8080 --catalog kafka —schema tpch;
这里 “tpch" 是 Kafka 连接器的模式,你将收到如下的响应。
Here “tpch" is a schema for Kafka connector and you will receive a response as the following.
presto:tpch>
Describe Customer Table
以下查询描述了 “customer” 表。
Following query describes “customer” table.
Result
Column | Type | Comment
-------------------+---------+---------------------------------------------
_partition_id | bigint | Partition Id
_partition_offset | bigint | Offset for the message within the partition
_segment_start | bigint | Segment start offset
_segment_end | bigint | Segment end offset
_segment_count | bigint | Running message count per segment
_key | varchar | Key text
_key_corrupt | boolean | Key data is corrupt
_key_length | bigint | Total number of key bytes
_message | varchar | Message text
_message_corrupt | boolean | Message data is corrupt
_message_length | bigint | Total number of message bytes
Apache Presto - JDBC Interface
Presto 的 JDBC 接口用于访问 Java 应用程序。
Presto’s JDBC interface is used to access Java application.
Prerequisites
安装 presto-jdbc-0.150.jar
Install presto-jdbc-0.150.jar
你可以通过访问以下链接下载 JDBC jar 文件,
You can download the JDBC jar file by visiting the following link,
下载 jar 文件后,将其添加到 Java 应用程序的类路径中。
After the jar file has been downloaded, add it to the class path of your Java application.
Create a Simple Application
让我们使用 JDBC 接口创建一个简单的 java 应用程序。
Let’s create a simple java application using JDBC interface.
编码 - PrestoJdbcSample.java
Coding − PrestoJdbcSample.java
import java.sql.*;
import com.facebook.presto.jdbc.PrestoDriver;
//import presto jdbc driver packages here.
public class PrestoJdbcSample {
public static void main(String[] args) {
Connection connection = null;
Statement statement = null;
try {
Class.forName("com.facebook.presto.jdbc.PrestoDriver");
connection = DriverManager.getConnection(
"jdbc:presto://localhost:8080/mysql/tutorials", "tutorials", “");
//connect mysql server tutorials database here
statement = connection.createStatement();
String sql;
sql = "select auth_id, auth_name from mysql.tutorials.author”;
//select mysql table author table two columns
ResultSet resultSet = statement.executeQuery(sql);
while(resultSet.next()){
int id = resultSet.getInt("auth_id");
String name = resultSet.getString(“auth_name");
System.out.print("ID: " + id + ";\nName: " + name + "\n");
}
resultSet.close();
statement.close();
connection.close();
}catch(SQLException sqlException){
sqlException.printStackTrace();
}catch(Exception exception){
exception.printStackTrace();
}
}
}
保存文件并退出应用程序。现在,在一个终端中启动 Presto 服务器,并打开一个新终端来编译和执行结果。以下是步骤:
Save the file and quit the application. Now, start Presto server in one terminal and open a new terminal to compile and execute the result. Following are the steps −
Apache Presto - Custom Function Application
创建一个 Maven 项目来开发 Presto 自定义函数。
Create a Maven project to develop Presto custom function.
SimpleFunctionsFactory.java
创建 SimpleFunctionsFactory 类来实现 FunctionFactory 接口。
Create SimpleFunctionsFactory class to implement FunctionFactory interface.
package com.tutorialspoint.simple.functions;
import com.facebook.presto.metadata.FunctionFactory;
import com.facebook.presto.metadata.FunctionListBuilder;
import com.facebook.presto.metadata.SqlFunction;
import com.facebook.presto.spi.type.TypeManager;
import java.util.List;
public class SimpleFunctionFactory implements FunctionFactory {
private final TypeManager typeManager;
public SimpleFunctionFactory(TypeManager typeManager) {
this.typeManager = typeManager;
}
@Override
public List<SqlFunction> listFunctions() {
return new FunctionListBuilder(typeManager)
.scalar(SimpleFunctions.class)
.getFunctions();
}
}
SimpleFunctionsPlugin.java
创建一个 SimpleFunctionsPlugin 类来实现 Plugin 接口。
Create a SimpleFunctionsPlugin class to implement Plugin interface.
package com.tutorialspoint.simple.functions;
import com.facebook.presto.metadata.FunctionFactory;
import com.facebook.presto.spi.Plugin;
import com.facebook.presto.spi.type.TypeManager;
import com.google.common.collect.ImmutableList;
import javax.inject.Inject;
import java.util.List;
import static java.util.Objects.requireNonNull;
public class SimpleFunctionsPlugin implements Plugin {
private TypeManager typeManager;
@Inject
public void setTypeManager(TypeManager typeManager) {
this.typeManager = requireNonNull(typeManager, "typeManager is null”);
//Inject TypeManager class here
}
@Override
public <T> List<T> getServices(Class<T> type){
if (type == FunctionFactory.class) {
return ImmutableList.of(type.cast(new SimpleFunctionFactory(typeManager)));
}
return ImmutableList.of();
}
}
Add Resource File
创建一个在实现包中指定的资源文件。
Create a resource file which is specified in the implementation package.
(com.tutorialspoint.simple.functions.SimpleFunctionsPlugin)
然后移动到资源文件所在的位置 @ /path/to/resource/
Now move to the resource file location @ /path/to/resource/
然后添加这些更改,
Then add the changes,
com.facebook.presto.spi.Plugin
pom.xml
将以下这些依赖项添加到 pom.xml 文件。
Add the following dependencies to pom.xml file.
<?xml version = "1.0"?>
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.tutorialspoint.simple.functions</groupId>
<artifactId>presto-simple-functions</artifactId>
<packaging>jar</packaging>
<version>1.0</version>
<name>presto-simple-functions</name>
<description>Simple test functions for Presto</description>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-spi</artifactId>
<version>0.149</version>
</dependency>
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-main</artifactId>
<version>0.149</version>
</dependency>
<dependency>
<groupId>javax.inject</groupId>
<artifactId>javax.inject</artifactId>
<version>1</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>19.0</version>
</dependency>
</dependencies>
<build>
<finalName>presto-simple-functions</finalName>
<plugins>
<!-- Make this jar executable -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.3.2</version>
</plugin>
</plugins>
</build>
</project>
SimpleFunctions.java
使用 Presto 属性创建一个 SimpleFunctions 类。
Create SimpleFunctions class using Presto attributes.
package com.tutorialspoint.simple.functions;
import com.facebook.presto.operator.Description;
import com.facebook.presto.operator.scalar.ScalarFunction;
import com.facebook.presto.operator.scalar.StringFunctions;
import com.facebook.presto.spi.type.StandardTypes;
import com.facebook.presto.type.LiteralParameters;
import com.facebook.presto.type.SqlType;
public final class SimpleFunctions {
private SimpleFunctions() {
}
@Description("Returns summation of two numbers")
@ScalarFunction(“mysum")
//function name
@SqlType(StandardTypes.BIGINT)
public static long sum(@SqlType(StandardTypes.BIGINT) long num1,
@SqlType(StandardTypes.BIGINT) long num2) {
return num1 + num2;
}
}
在创建应用程序之后编译并执行该应用程序。它将生成 JAR 文件。复制该文件并将 JAR 文件移动到目标 Presto 服务器插件目录中。
After the application is created compile and execute the application. It will produce the JAR file. Copy the file and move the JAR file into the target Presto server plugin directory.