Apache Presto 简明教程

Apache Presto - Quick Guide

Apache Presto - Overview

数据分析是对原始数据进行分析以收集相关信息、做出更好决策的过程。它主要用于许多组织中以做出业务决策。大数据分析包含海量数据,且此过程非常复杂,因此公司采用了不同策略。

Data analytics is the process of analyzing raw data to gather relevant information for better decision making. It is primarily used in many organizations to make business decisions. Well, big data analytics involves a large amount of data and this process is quite complex, hence companies use different strategies.

例如,Facebook 是全球领先的数据驱动和最大的数据仓库公司之一。Facebook 仓库数据存储在 Hadoop 中用于大规模计算。后来,当仓库数据增长到 PB 级别时,他们决定开发一个具有低延迟的新系统。在 2012 年,Facebook 团队成员设计了 “Presto” ,针对交互式查询分析而设计,即使具有 PB 级数据,也能快速运行。

For example, Facebook is one of the leading data driven and largest data warehouse company in the world. Facebook warehouse data is stored in Hadoop for large scale computation. Later, when warehouse data grew to petabytes, they decided to develop a new system with low latency. In the year of 2012, Facebook team members designed “Presto” for interactive query analytics that would operate quickly even with petabytes of data.

What is Apache Presto?

Apache Presto 是一个分布式并行查询执行引擎,针对低延迟和交互式查询分析而优化。Presto 可以轻松运行查询,并在从 GB 到 PB 的范围内进行扩展,且不会出现宕机。

Apache Presto is a distributed parallel query execution engine, optimized for low latency and interactive query analysis. Presto runs queries easily and scales without down time even from gigabytes to petabytes.

一个 Presto 查询可以处理来自 HDFS、MySQL、Cassandra、Hive 等多个来源的数据,还可以处理许多其他数据源。Presto 是用 Java 编写的,且易于与其他数据基础设施组件集成。Presto 功能强大,Airbnb、DropBox、Groupon、Netflix 等领先公司都在采用它。

A single Presto query can process data from multiple sources like HDFS, MySQL, Cassandra, Hive and many more data sources. Presto is built in Java and easy to integrate with other data infrastructure components. Presto is powerful, and leading companies like Airbnb, DropBox, Groupon, Netflix are adopting it.

Presto − Features

Presto 包含以下功能:

Presto contains the following features −

  1. Simple and extensible architecture.

  2. Pluggable connectors - Presto supports pluggable connector to provide metadata and data for queries.

  3. Pipelined executions - Avoids unnecessary I/O latency overhead.

  4. User-defined functions - Analysts can create custom user-defined functions to migrate easily.

  5. Vectorized columnar processing.

Presto − Benefits

以下是 Apache Presto 提供的好处:

Here is a list of benefits that Apache Presto offers −

  1. Specialized SQL operations

  2. Easy to install and debug

  3. Simple storage abstraction

  4. Quickly scales petabytes data with low latency

Presto − Applications

Presto 支持当今大多数最佳行业应用。让我们来看一下一些著名应用。

Presto supports most of today’s best industrial applications. Let’s take a look at some of the notable applications.

  1. Facebook − Facebook built Presto for data analytics needs. Presto easily scales large velocity of data.

  2. Teradata − Teradata provides end-to-end solutions in Big Data analytics and data warehousing. Teradata contribution to Presto makes it easier for more companies to enable all analytical needs.

  3. Airbnb − Presto is an integral part of the Airbnb data infrastructure. Well, hundreds of employees are running queries each day with the technology.

Why Presto?

Presto 支持标准 ANSI SQL,这让数据分析师和开发人员的工作变得非常轻松。尽管它以 Java 编写,但它避免了 Java 代码与内存分配和垃圾回收相关的典型问题。Presto 具有与 Hadoop 友好的连接器架构。它允许轻松插入文件系统。

Presto supports standard ANSI SQL which has made it very easy for data analysts and developers. Though it is built in Java, it avoids typical issues of Java code related to memory allocation and garbage collection. Presto has a connector architecture that is Hadoop friendly. It allows to easily plug in file systems.

Presto 在多个 Hadoop 分布上运行。此外,Presto 可以通过一个 Hadoop 平台查询 Cassandra、关系型数据库或其他数据存储。这种跨平台分析功能允许 Presto 用户从千兆字节到拍字节的数据中提取最大业务价值。

Presto runs on multiple Hadoop distributions. In addition, Presto can reach out from a Hadoop platform to query Cassandra, relational databases, or other data stores. This cross-platform analytic capability allows Presto users to extract maximum business value from gigabytes to petabytes of data.

Apache Presto - Architecture

Presto 的架构几乎类似于经典 MPP(海量并行处理)DBMS 架构。下图说明了 Presto 的架构。

The architecture of Presto is almost similar to classic MPP (massively parallel processing) DBMS architecture. The following diagram illustrates the architecture of Presto.

presto architecture

上述图表包含不同的组件。下表详细描述了每个组件。

The above diagram consists of different components. Following table describes each of the component in detail.

S.No

Component & Description

1.

Client Client (Presto CLI) submits SQL statements to a coordinator to get the result.

2.

Coordinator Coordinator is a master daemon. The coordinator initially parses the SQL queries then analyzes and plans for the query execution. Scheduler performs pipeline execution, assigns work to the closest node and monitors progress.

3.

Connector Storage plugins are called as connectors. Hive, HBase, MySQL, Cassandra and many more act as a connector; otherwise you can also implement a custom one. The connector provides metadata and data for queries. The coordinator uses the connector to get metadata for building a query plan.

4.

Worker The coordinator assigns task to worker nodes. The workers get actual data from the connector. Finally, the worker node delivers result to the client.

Presto − Workflow

Presto 是一个分布式系统,在集群的节点上运行。Presto 的分布式查询引擎经过优化,适用于交互分析,并支持标准 ANSI SQL,包括复杂查询、聚合、连接和窗口函数。Presto 架构简单而可扩展。Presto 客户端(CLI)向一个主守护进程协调器提交 SQL 语句。

Presto is a distributed system that runs on a cluster of nodes. Presto’s distributed query engine is optimized for interactive analysis and supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions. Presto architecture is simple and extensible. Presto client (CLI) submits SQL statements to a master daemon coordinator.

调度器通过执行管道连接。调度器会将工作分配给最靠近数据且可以监控进度的节点。协调器将任务分配给多个工作节点,最终工作节点将结果返回给客户端。客户端从输出进程获取数据。可扩展性是一项主要的设计要点。像 Hive、HBase、MySQL 等可插入连接器可为查询提供元数据和数据。Presto 采用“简单存储抽象”进行设计,使针对不同类型数据源提供 SQL 查询功能变得容易。

The scheduler connects through execution pipeline. The scheduler assigns work to nodes which is closest to the data and monitors progress. The coordinator assigns task to multiple worker nodes and finally the worker node delivers the result back to the client. The client pulls data from the output process. Extensibility is the key design. Pluggable connectors like Hive, HBase, MySQL, etc., provides metadata and data for queries. Presto was designed with a “simple storage abstraction” that makes it easy to provide SQL query capability against these different kind of data sources.

Execution Model

Presto 利用专为支持 SQL 语义而设计的操作员,支持自定义查询和执行引擎。除了改进调度外,所有处理工作都在内存中执行,并在不同阶段通过网络进行管道传输。这避免了不必要的 I/O 延迟开销。

Presto supports custom query and execution engine with operators designed to support SQL semantics. In addition to improved scheduling, all processing is in memory and pipelined across the network between different stages. This avoids unnecessary I/O latency overhead.

Apache Presto - Installation

本章将解释如何在您的机器上安装 Presto。让我们了解 Presto 的基本要求,

This chapter will explain how to install Presto on your machine. Let’s go through the basic requirements of Presto,

  1. Linux or Mac OS

  2. Java version 8

现在,让我们继续执行以下步骤在您的机器上安装 Presto。

Now, let’s continue the following steps to install Presto on your machine.

Verifying Java installation

希望您现在已在您的机器上安装了 Java 8,您只需使用以下命令对其进行验证。

Hopefully, you have already installed Java version 8 on your machine right now, so you just verify it using the following command.

$ java -version

如果 Java 已成功安装在您的机器上,您将看到已安装的 Java 版本。如果未安装 Java,请按照后续步骤在您的机器上安装 Java 8。

If Java is successfully installed on your machine, you could see the version of installed Java. If Java is not installed, follow the subsequent steps to install Java 8 on your machine.

下载 JDK。访问以下链接下载最新版本的 JDK。

Download JDK. Download the latest version of JDK by visiting the following link.

最新版本是 JDK 8u 92,文件为 “jdk-8u92-linux-x64.tar.gz”。请在您的机器上下载该文件。

The latest version is JDK 8u 92 and the file is “jdk-8u92-linux-x64.tar.gz”. Please download the file on your machine.

然后,解压文件并移动到指定目录。

After that, extract the files and move to the specific directory.

然后设置 Java 替代项。最终,Java 将安装在您的机器上。

Then set Java alternatives. Finally Java will be installed on your machine.

Apache Presto Installation

访问以下链接下载最新版本的 Presto,

Download the latest version of Presto by visiting the following link,

现在,“presto-server-0.149.tar.gz” 的最新版本将下载到您的机器上。

Now the latest version of “presto-server-0.149.tar.gz” will be downloaded on your machine.

Extract tar Files

使用以下命令解压 tar 文件 −

Extract the tar file using the following command −

$ tar  -zxf  presto-server-0.149.tar.gz
$ cd presto-server-0.149

Configuration Settings

Create “data” directory

在安装目录之外创建一个数据目录,该目录将用于存储日志、元数据等,以便在升级 Presto 时可以轻松保存。可使用以下代码定义该目录 −

Create a data directory outside the installation directory, which will be used for storing logs, metadata, etc., so that it is to be easily preserved when upgrading Presto. It is defined using the following code −

$ cd
$ mkdir data

要查看它的位置路径,请使用命令“pwd”。此位置将在次の节点配置文件中分配。

To view the path where it is located, use the command “pwd”. This location will be assigned in the next node.properties file.

Create “etc” directory

使用以下代码在 Presto 安装目录内创建一个等目录 −

Create an etc directory inside Presto installation directory using the following code −

$ cd presto-server-0.149
$ mkdir etc

该目录将保存配置文件。让我们逐个创建每个文件。

This directory will hold configuration files. Let’s create each file one by one.

Node Properties

Presto 节点配置文件包含特定于每个节点的环境配置。使用以下代码在等目录 (etc/node.properties) 中创建该文件 −

Presto node properties file contains environmental configuration specific to each node. It is created inside etc directory (etc/node.properties) using the following code −

$ cd etc
$ vi node.properties

node.environment = production
node.id = ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir = /Users/../workspace/Presto

完成所有更改后,保存文件并退出终端。其中 node.data 是上述创建的 data 目录的位置路径。 node.id 表示每个节点的唯一标识符。

After making all the changes, save the file, and quit the terminal. Here node.data is the location path of the above created data directory. node.id represents the unique identifier for each node.

JVM Config

在 etc 目录中创建一个文件 “jvm.config”(etc/jvm.config)。此文件包含用于启动 Java 虚拟机的命令行选项列表。

Create a file “jvm.config” inside etc directory (etc/jvm.config). This file contains a list of command line options used for launching the Java Virtual Machine.

$ cd etc
$ vi jvm.config

-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize = 32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError = kill -9 %p

完成所有更改后,保存文件并退出终端。

After making all the changes, save the file, and quit the terminal.

Config Properties

在 etc 目录中创建一个文件 “config.properties”(etc/config.properties)。此文件包含 Presto 服务器的配置。如果设置一台机器用于测试,Presto 服务器只能充当使用以下代码定义的协调进程 −

Create a file “config.properties” inside etc directory(etc/config.properties). This file contains the configuration of Presto server. If you are setting up a single machine for testing, Presto server can function only as the coordination process as defined using the following code −

$ cd etc
$ vi config.properties

coordinator = true
node-scheduler.include-coordinator = true
http-server.http.port = 8080
query.max-memory = 5GB
query.max-memory-per-node = 1GB
discovery-server.enabled = true
discovery.uri = http://localhost:8080

在此,

Here,

  1. coordinator − master node.

  2. node-scheduler.include-coordinator − Allows scheduling work on the coordinator.

  3. http-server.http.port − Specifies the port for the HTTP server.

  4. query.max-memory=5GB − The maximum amount of distributed memory.

  5. query.max-memory-per-node=1GB − The maximum amount of memory per node.

  6. discovery-server.enabled − Presto uses the Discovery service to find all the nodes in the cluster.

  7. discovery.uri − he URI to the Discovery server.

如果设置多台机器 Presto 服务器,Presto 将充当协调进程和工作进程。使用此配置设置在多台机器上测试 Presto 服务器。

If you are setting up multiple machine Presto server, Presto will function as both coordination and worker process. Use this configuration setting to test Presto server on multiple machines.

Configuration for Coordinator

$ cd etc
$ vi config.properties

coordinator = true
node-scheduler.include-coordinator = false
http-server.http.port = 8080
query.max-memory = 50GB
query.max-memory-per-node = 1GB
discovery-server.enabled = true
discovery.uri = http://localhost:8080

Configuration for Worker

$ cd etc
$ vi config.properties

coordinator = false
http-server.http.port = 8080
query.max-memory = 50GB
query.max-memory-per-node = 1GB
discovery.uri = http://localhost:8080

Log Properties

在 etc 目录中创建一个文件 “log.properties”(etc/log.properties)。此文件包含命名日志记录程序层次结构的最低日志级别。它使用以下代码定义 −

Create a file “log.properties” inside etc directory(etc/log.properties). This file contains minimum log level for named logger hierarchies. It is defined using the following code −

$ cd etc
$ vi log.properties
com.facebook.presto = INFO

保存文件并退出终端。在此,使用了 DEBUG、INFO、WARN 和 ERROR 四个日志级别。默认日志级别为 INFO。

Save the file and quit the terminal. Here, four log levels are used such as DEBUG, INFO, WARN and ERROR. Default log level is INFO.

Catalog Properties

在 etc 目录中创建一个目录 “catalog”(etc/catalog)。这将用于装载数据。例如,使用以下内容创建 etc/catalog/jmx.properties ,以将 jmx connector 装载为 jmx 目录 −

Create a directory “catalog” inside etc directory(etc/catalog). This will be used for mounting data. For example, create etc/catalog/jmx.properties with the following contents to mount the jmx connector as the jmx catalog −

$ cd etc
$ mkdir catalog
$ cd catalog
$ vi jmx.properties
connector.name = jmx

Start Presto

可以使用以下命令启动 Presto,

Presto can be started using the following command,

$ bin/launcher start

然后,您将看到类似于此的响应,

Then you will see the response similar to this,

Started as 840

Run Presto

若要启动 Presto 服务器,请使用以下命令 −

To launch Presto server, use the following command −

$ bin/launcher run

在成功启动 Presto 服务器后,您可在 “var/log” 目录中找到日志文件。

After successfully launching Presto server, you can find log files in “var/log” directory.

  1. launcher.log − This log is created by the launcher and is connected to the stdout and stderr streams of the server.

  2. server.log − This is the main log file used by Presto.

  3. http-request.log − HTTP request received by the server.

到目前为止,您已在自己的计算机上成功安装了 Presto 配置设置。让我们继续执行步骤来安装 Presto CLI。

As of now, you have successfully installed Presto configuration settings on your machine. Let’s continue the steps to install Presto CLI.

Install Presto CLI

Presto CLI 提供了基于终端的交互式 shell,用于运行查询。

The Presto CLI provides a terminal-based interactive shell for running queries.

通过访问以下链接下载 Presto CLI,

Download the Presto CLI by visiting the following link,

现在,您的计算机上将安装“presto-cli-0.149-executable.jar”。

Now “presto-cli-0.149-executable.jar” will be installed on your machine.

Run CLI

在下载 presto-cli 后,将其复制到要从中运行它的位置。此位置可以是具有到协调器网络访问权限的任何节点。首先将 Jar 文件的名称更改为 Presto。然后使用以下代码通过 chmod + x 命令使其可执行 −

After downloading the presto-cli, copy it to the location which you want to run it from. This location may be any node that has network access to the coordinator. First change the name of the Jar file to Presto. Then make it executable with chmod + x command using the following code −

$ mv presto-cli-0.149-executable.jar presto
$ chmod +x presto

现在使用以下命令执行 CLI,

Now execute CLI using the following command,

./presto --server localhost:8080 --catalog jmx --schema default
Here jmx(Java Management Extension) refers to catalog and default referes to schema.

您将看到以下响应,

You will see the following response,

 presto:default>

现在,在您的终端上键入“jps”命令,您将看到正在运行的守护程序。

Now type “jps” command on your terminal and you will see the running daemons.

Stop Presto

在执行所有执行操作后,可以使用以下命令停止 presto 服务器 −

After having performed all the executions, you can stop the presto server using the following command −

$ bin/launcher stop

Apache Presto - Configuration Settings

本章将讨论 Presto 的配置设置。

This chapter will discuss the configuration settings for Presto.

Presto Verifier

Presto Verifier 可用于将 Presto 与另一个数据库(如 MySQL)进行测试,或将两个 Presto 集群彼此进行测试。

The Presto Verifier can be used to test Presto against another database (such as MySQL), or to test two Presto clusters against each other.

Create Database in MySQL

打开 MySQL 服务器并使用以下命令创建一个数据库。

Open MySQL server and create a database using the following command.

create database test

现在你在服务器中创建了“test”数据库。创建表并使用以下查询加载它。

Now you have created “test” database in the server. Create the table and load it with the following query.

CREATE TABLE verifier_queries(
   id INT NOT NULL AUTO_INCREMENT,
   suite VARCHAR(256) NOT NULL,
   name VARCHAR(256),
   test_catalog VARCHAR(256) NOT NULL,
   test_schema VARCHAR(256) NOT NULL,
   test_prequeries TEXT,
   test_query TEXT NOT NULL,
   test_postqueries TEXT,
   test_username VARCHAR(256) NOT NULL default 'verifier-test',
   test_password VARCHAR(256),
   control_catalog VARCHAR(256) NOT NULL,
   control_schema VARCHAR(256) NOT NULL,
   control_prequeries TEXT,
   control_query TEXT NOT NULL,
   control_postqueries TEXT,
   control_username VARCHAR(256) NOT NULL default 'verifier-test',
   control_password VARCHAR(256),
   session_properties_json TEXT,
   PRIMARY KEY (id)
);

Add Config Settings

创建一个属性文件以配置验证器-

Create a properties file to configure the verifier −

$ vi config.properties

suite = mysuite
query-database = jdbc:mysql://localhost:3306/tutorials?user=root&password=pwd
control.gateway = jdbc:presto://localhost:8080
test.gateway = jdbc:presto://localhost:8080
thread-count = 1

这里,在 query-database 字段中,输入以下内容:mysql 数据库名称、用户名和密码。

Here, in the query-database field, enter the following details − mysql database name, user name, and password.

Download JAR File

通过访问以下链接下载 Presto 验证器 jar 文件,

Download Presto-verifier jar file by visiting the following link,

现在已在你的机器上下载版本 “presto-verifier-0.149-executable.jar”

Now the version “presto-verifier-0.149-executable.jar” is downloaded on your machine.

Execute JAR

使用以下命令执行 JAR 文件,

Execute the JAR file using the following command,

$ mv presto-verifier-0.149-executable.jar verifier
$ chmod+x verifier

Run Verifier

使用以下命令运行验证器,

Run the verifier using the following command,

$ ./verifier config.properties

Create Table

我们使用以下查询在 “test” 数据库中创建一个简单表。

Let’s create a simple table in “test” database using the following query.

create table product(id int not null, name varchar(50))

Insert Table

创建表后,使用以下查询插入两条记录,

After creating a table, insert two records using the following query,

insert into product values(1,’Phone')
insert into product values(2,’Television’)

Run Verifier Query

在验证器终端(./verifier config.propeties)中执行以下示例查询,以检查验证器结果。

Execute the following sample query in the verifier terminal (./verifier config.propeties) to check the verifier result.

Sample Query

insert into verifier_queries (suite, test_catalog, test_schema, test_query,
control_catalog, control_schema, control_query) values
('mysuite', 'mysql', 'default', 'select * from mysql.test.product',
'mysql', 'default', 'select * from mysql.test.product');

这里, select * from mysql.test.product 查询引用 mysql 目录, test 是数据库名称, product 是表名称。通过这种方式,你可以使用 Presto 服务器访问 mysql 连接器。

Here, select * from mysql.test.product query refers to mysql catalog, test is database name and product is table name. In this way, you can access mysql connector using Presto server.

这里,两个相同的 select 查询彼此进行测试,以查看性能。类似地,你可以运行其他查询来测试性能结果。你还可以连接两个 Presto 集群来检查性能结果。

Here, two same select queries are tested against each other to see the performance. Similarly, you can run other queries to test the performance results. You can also connect two Presto clusters to check the performance results.

Apache Presto - Administration Tools

在本章中,我们将讨论 Presto 中使用的管理工具。让我们从 Presto 的 Web 界面开始。

In this chapter, we will discuss the administration tools used in Presto. Let’s start with the Web Interface of Presto.

Web Interface

Presto 提供了一个用于监控和管理查询的 Web 界面。可以从协调器 Config 属性中指定的端口号访问它。

Presto provides a web interface for monitoring and managing queries. It can be accessed from the port number specified in the coordinator Config Properties.

启动 Presto 服务器和 Presto CLI。然后,您可以从以下 url 访问 Web 界面 − http://localhost:8080/

Start Presto server and Presto CLI. Then you can access the web interface from the following url − http://localhost:8080/

web interface

输出将与上述屏幕相似。

The output will be similar to the above screen.

在此,主页具有查询列表以及以下信息:唯一查询 ID、查询文本、查询状态、完成百分比、用户名和此查询的来源。最新的查询首先运行,然后在底部显示已完成或未完成的查询。

Here, the main page has a list of queries along with information like unique query ID, query text, query state, percentage completed, username and source from which this query is originated. Latest queries are running first, then completed or not completed queries are displayed at the bottom.

Tuning the Performance on Presto

如果 Presto 集群有任何与性能相关的问题,请将您的默认配置设置更改为以下设置。

If Presto cluster is having any performance-related issues, change your default configuration settings to the following settings.

Config Properties

  1. task. info -refresh-max-wait − Reduces coordinator work load.

  2. task.max-worker-threads − Splits the process and assigns to each worker nodes.

  3. distributed-joins-enabled − Hash-based distributed joins.

  4. node-scheduler.network-topology − Sets network topology to scheduler.

JVM Settings

将默认 JVM 设置更改为以下设置。这有助于诊断垃圾回收问题。

Change your default JVM settings to the following settings. This will be helpful for diagnosing garbage collection issues.

-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCCause
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintReferenceGC
-XX:+PrintClassHistogramAfterFullGC
-XX:+PrintClassHistogramBeforeFullGC
-XX:PrintFLSStatistics = 2
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount = 1

Apache Presto - Basic SQL Operations

在本章中,我们将讨论如何在 Presto 上创建并执行查询。我们来了解一下 Presto 支持的基本数据类型。

In this chapter, we will discuss how to create and execute queries on Presto. Let us go through Presto supported basic data types.

Basic Data Types

下表描述了 Presto 的基本数据类型。

The following table describes the basic data types of Presto.

S.No

Data type & Description

1.

VARCHAR Variable length character data

2.

BIGINT A 64-bit signed integer

3.

DOUBLE A 64-bit floating point double precision value

4.

DECIMAL A fixed precision decimal number. For example DECIMAL(10,3) - 10 is precision, i.e. total number of digits and 3 is scale value represented as fractional point. Scale is optional and default value is 0

5.

BOOLEAN Boolean values true and false

6.

VARBINARY Variable length binary data

7.

JSON JSON data

8.

DATE Date data type represented as year-month-day

9.

TIME, TIMESTAMP, TIMESTAMP with TIME ZONE TIME - Time of the day (hour-min-sec-millisecond) TIMESTAMP - Date and time of the day TIMESTAMP with TIME ZONE - Date and time of the day with time zone from the value

10.

INTERVAL Stretch or extend date and time data types

11.

ARRAY Array of the given component type. For example, ARRAY[5,7]

12.

MAP Map between the given component types. For example, MAP(ARRAY[‘one’,’two’],ARRAY[5,7])

13.

ROW Row structure made up of named fields

Presto − Operators

Presto 运算符列在以下表中。

Presto operators are listed in the following table.

S.No

Operator & Description

1.

Arithmetic operatorPresto supports arithmetic operators such as +, -, *, /, %

2.

Relational operator<,>,⇐,>=,=,<>

3.

Logical operatorAND, OR, NOT

4.

Range operatorRange operator is used to test the value in a specific range. Presto supports BETWEEN, IS NULL, IS NOT NULL, GREATEST and LEAST

5.

Decimal operatorBinary arithmetic decimal operator performs binary arithmetic operation for decimal type Unary decimal operator − The - operator performs negation

6.

String operatorThe *‘

’ operator* performs string concatenation

7.

Date and time operatorPerforms arithmetic addition and subtraction operations on date and time data types

8.

Array operatorSubscript operator[] - access an element of an array Concatenation operator

- concatenate an array with an array or an element of the same type

9.

Map operatorMap subscript operator [] - retrieve the value corresponding to a given key from a map

Apache Presto - SQL Functions

到目前为止,我们在讨论对 Presto 运行一些基本查询。本章将讨论重要的 SQL 函数。

As of now we were discussing running some simple basic queries on Presto. This chapter will discuss the important SQL functions.

Math Functions

数学函数处理数学公式。下表详细描述了函数列表。

Math functions operate on mathematical formulas. Following table describes the list of functions in detail.

S.No.

Function & Description

1.

abs(x)Returns the absolute value of x

2.

cbrt(x)Returns the cube root of x

3.

ceiling(x)Returns the x value rounded up to the nearest integer

4.

ceil(x) Alias for ceiling(x)

5.

degrees(x)Returns the degree value for x

6.

e(x)Returns the double value for Euler’s number

7.

exp(x) Returns the exponent value for Euler’s number

8.

floor(x)Returns x rounded down to the nearest integer

9.

from_base(string,radix) Returns the value of string interpreted as a base-radix number

10.

ln(x) Returns the natural logarithm of x

11.

log2(x)Returns the base 2 logarithm of x

12.

log10(x) Returns the base 10 logarithm of x

13.

log(x,y) Returns the base y logarithm of x

14.

mod(n,m)Returns the modulus (remainder) of n divided by m

15.

pi() Returns pi value. The result will be returned as a double value

16.

power(x,p)Returns power of value ‘p’ to the x value

17.

pow(x,p) Alias for power(x,p)

18.

radians(x)converts the angle x in degree radians

19.

rand() Alias for radians()

20.

random()Returns the pseudo-random value

21.

rand(n) Alias for random()

22.

round(x)Returns the rounded value for x

23.

round(x,d) x value rounded for the ‘d’ decimal places

24.

sign(x) Returns the signum function of x, i.e., 0 if the argument is 0 1 if the argument is greater than 0 -1 if the argument is less than 0 For double arguments, the function additionally returns − NaN if the argument is NaN 1 if the argument is +Infinity -1 if the argument is -Infinity

25.

sqrt(x)Returns the square root of x

26.

to_base(x,radix)Return type is archer. The result is returned as the base radix for x

27.

truncate(x)Truncates the value for x

28.

width_bucket(x, bound1, bound2, n)Returns the bin number of x specified bound1 and bound2 bounds and n number of buckets

29.

width_bucket(x, bins)Returns the bin number of x according to the bins specified by the array bins

Trigonometric Functions

三角函数参数表示为弧度()。下表列出了这些函数。

Trigonometric functions arguments are represented as radians(). Following table lists out the functions.

S.No

Functions & Description

1.

acos(x)Returns the inverse cosine value(x)

2.

asin(x) Returns the inverse sine value(x)

3.

atan(x) Returns the inverse tangent value(x)

4.

atan2(y,x)Returns the inverse tangent value(y/x)

5.

cos(x) Returns the cosine value(x)

6.

cosh(x)Returns the hyperbolic cosine value(x)

7.

sin(x)Returns the sine value(x)

8.

tan(x) Returns the tangent value(x)

9.

tanh(x) Returns the hyperbolic tangent value(x)

Bitwise Functions

下表列出了按位函数。

The following table lists out the Bitwise functions.

S.No

Functions & Description

1.

bit_count(x, bits)Count the number of bits

2.

bitwise_and(x,y)Perform bitwise AND operation for two bits, x and y

3.

bitwise_or(x,y)Bitwise OR operation between two bits x, y

4.

bitwise_not(x)Bitwise Not operation for bit x

5.

bitwise_xor(x,y)XOR operation for bits x, y

String Functions

下表列出了字符串函数。

Following table lists out the String functions.

S.No

Functions & Description

1.

concat(string1, …​, stringN)Concatenate the given strings

2.

length(string)Returns the length of the given string

3.

lower(string)Returns the lowercase format for the string

4.

upper(string)Returns the uppercase format for the given string

5.

lpad(string, size, padstring)Left padding for the given string

6.

ltrim(string)Removes the leading whitespace from the string

7.

replace(string, search, replace)Replaces the string value

8.

reverse(string)Reverses the operation performed for the string

9.

rpad(string, size, padstring)Right padding for the given string

10.

rtrim(string)Removes the trailing whitespace from the string

11.

split(string, delimiter)Splits the string on delimiter and returns an array of size at the most limit

12.

split_part(string, delimiter, index)Splits the string on delimiter and returns the field index

13.

strpos(string, substring)Returns the starting position of the substring in the string

14.

substr(string, start)Returns the substring for the given string

15.

substr(string, start, length)Returns the substring for the given string with the specific length

16.

trim(string)Removes the leading and trailing whitespace from the string

Date and Time Functions

下表列出了日期和时间函数。

Following table lists out the Date and Time functions.

S.No

Functions & Description

1.

current_dateReturns the current date

2.

current_timeReturns the current time

3.

current_timestampReturns the current timestamp

4.

current_timezone()Returns the current timezone

5.

now()Returns the current date,timestamp with the timezone

6.

localtimeReturns the local time

7.

localtimestampReturns the local timestamp

Regular Expression Functions

下表列出了正则表达式函数。

The following table lists out the Regular Expression functions.

S.No

Functions & Description

1.

regexp_extract_all(string, pattern)Returns the string matched by the regular expression for the pattern

2.

regexp_extract_all(string, pattern, group)Returns the string matched by the regular expression for the pattern and the group

3.

regexp_extract(string, pattern)Returns the first substring matched by the regular expression for the pattern

4.

regexp_extract(string, pattern, group)Returns the first substring matched by the regular expression for the pattern and the group

5.

regexp_like(string, pattern)Returns the string matches for the pattern. If the string is returned, the value will be true otherwise false

6.

regexp_replace(string, pattern)Replaces the instance of the string matched for the expression with the pattern

7.

regexp_replace(string, pattern, replacement)Replace the instance of the string matched for the expression with the pattern and replacement

8.

regexp_split(string, pattern)Splits the regular expression for the given pattern

JSON Functions

下表列出了 JSON 函数。

The following table lists out JSON functions.

S.No

Functions & Description

1.

json_array_contains(json, value)Check the value exists in a json array. If the value exists it will return true, otherwise false

2.

json_array_get(json_array, index)Get the element for index in json array

3.

json_array_length(json)Returns the length in json array

4.

json_format(json)Returns the json structure format

5.

json_parse(string)Parses the string as a json

6.

json_size(json, json_path)Returns the size of the value

URL Functions

下表列出了 URL 函数。

The following table lists out the URL functions.

S.No

Functions & Description

1.

url_extract_host(url)Returns the URL’s host

2.

url_extract_path(url)Returns the URL’s path

3.

url_extract_port(url)Returns the URL’s port

4.

url_extract_protocol(url)Returns the URL’s protocol

5.

url_extract_query(url)Returns the URL’s query string

Aggregate Functions

下表列出了汇总函数。

The following table lists out the Aggregate functions.

S.No

Functions & Description

1.

avg(x) Returns average for the given value

2.

min(x,n)Returns the minimum value from two values

3.

max(x,n)Returns the maximum value from two values

4.

sum(x)Returns the sum of value

5.

count(*)Returns the number of input rows

6.

count(x)Returns the count of input values

7.

checksum(x)Returns the checksum for x

8.

arbitrary(x)Returns the arbitrary value for x

Color Functions

下表列出了颜色函数。

Following table lists out the Color functions.

S.No

Functions & Description

1.

bar(x, width)Renders a single bar using rgb low_color and high_color

2.

bar(x, width, low_color, high_color)Renders a single bar for the specified width

3.

color(string)Returns the color value for the entered string

4.

render(x, color)Renders value x using the specific color using ANSI color codes

5.

render(b)Accepts boolean value b and renders a green true or a red false using ANSI color codes

6.

rgb(red, green, blue) Returns a color value capturing the RGB value of three component color values supplied as int parameters ranging from 0 to 255

Array Functions

下表列出了数组函数。

The following table lists out the Array functions.

S.No

Functions & Description

1.

array_max(x)Finds the max element in an array

2.

array_min(x)Finds the min element in an array

3.

array_sort(x)Sorts the elements in an array

4.

array_remove(x,element)Removes the specific element from an array

5.

concat(x,y)Concatenates two arrays

6.

contains(x,element)Finds the given elements in an array. True will be returned if it is present, otherwise false

7.

array_position(x,element)Find the position of the given element in an array

8.

array_intersect(x,y)Performs an intersection between two arrays

9.

element_at(array,index)Returns the array element position

10.

slice(x,start,length)Slices the array elements with the specific length

Teradata Functions

下表列出了 Teradata 函数。

The following table lists out Teradata functions.

S.No

Functions & Description

1.

index(string,substring)Returns the index of the string with the given substring

2.

substring(string,start)Returns the substring of the given string. You can specify the start index here

3.

substring(string,start,length)Returns the substring of the given string for the specific start index and length of the string

Apache Presto - MySQL Connector

MySQL 连接器用于查询外部 MySQL 数据库。

The MySQL connector is used to query an external MySQL database.

Prerequisites

MySQL 服务器安装。

MySQL server installation.

Configuration Settings

希望您已经在您的机器上安装了 MySQL 服务器。要使 Presto 服务器启用 MySQL 属性,您必须在 “etc/catalog” 目录中创建一个文件 “mysql.properties” 。发出以下命令以创建一个 mysql.properties 文件。

Hopefully you have installed mysql server on your machine. To enable mysql properties on Presto server, you must create a file “mysql.properties” in “etc/catalog” directory. Issue the following command to create a mysql.properties file.

$ cd etc
$ cd catalog
$ vi mysql.properties

connector.name = mysql
connection-url = jdbc:mysql://localhost:3306
connection-user = root
connection-password = pwd

保存文件并退出终端。在上述文件中,您必须在 connection-password 字段中输入 MySQL 密码。

Save the file and quit the terminal. In the above file, you must enter your mysql password in connection-password field.

Create Database in MySQL Server

打开 MySQL 服务器并使用以下命令创建一个数据库。

Open MySQL server and create a database using the following command.

create database tutorials

现在您已经在服务器中创建了“tutorials”数据库。要启用数据库类型,请在查询窗口中使用命令“use tutorials”。

Now you have created “tutorials” database in the server. To enable database type, use the command “use tutorials” in the query window.

Create Table

我们来在“tutorials”数据库中创建一个简单的表。

Let’s create a simple table on “tutorials” database.

create table author(auth_id int not null, auth_name varchar(50),topic varchar(100))

Insert Table

创建表后,使用以下查询插入三条记录。

After creating a table, insert three records using the following query.

insert into author values(1,'Doug Cutting','Hadoop')
insert into author values(2,’James Gosling','java')
insert into author values(3,'Dennis Ritchie’,'C')

Select Records

要检索所有记录,请键入以下查询。

To retrieve all the records, type the following query.

Query

select * from author

Result

auth_id    auth_name      topic
1        Doug Cutting     Hadoop
2        James Gosling    java
3        Dennis Ritchie     C

截至目前,您已使用 MySQL 服务器查询了数据。我们来将 Mysql 存储插件连接到 Presto 服务器。

As of now, you have queried data using MySQL server. Let’s connect Mysql storage plugin to Presto server.

Connect Presto CLI

键入以下命令以在 Presto CLI 上连接 MySql 插件。

Type the following command to connect MySql plugin on Presto CLI.

./presto --server localhost:8080 --catalog mysql --schema tutorials

您将收到以下应答。

You will receive the following response.

presto:tutorials>

此处 “tutorials” 指的是 mysql 服务器中的架构。

Here “tutorials” refers to schema in mysql server.

List Schemas

要列出 mysql 中的所有架构,请在 Presto 服务器中键入以下查询。

To list out all the schemas in mysql, type the following query in Presto server.

Query

presto:tutorials> show schemas from mysql;

Result

      Schema
--------------------
 information_schema
 performance_schema
 sys
 tutorials

从此结果中,我们可以得出结论,前三个架构是预定义的,最后一个是您自己创建的。

From this result, we can conclude the first three schemas as predefined and the last one as created by yourself.

List Tables from Schema

以下查询列出教程架构中的所有表。

Following query lists out all the tables in tutorials schema.

Query

presto:tutorials> show tables from mysql.tutorials;

Result

  Table
--------
 author

我们只在此架构中创建了一个表。如果您创建了多个表,它将列出所有表。

We have created only one table in this schema. If you have created multiple tables, it will list out all the tables.

Describe Table

要描述表字段,请键入以下查询。

To describe the table fields, type the following query.

Query

presto:tutorials> describe mysql.tutorials.author;

Result

  Column   |     Type     | Comment
-----------+--------------+---------
 auth_id   | integer      |
 auth_name | varchar(50)  |
 topic     | varchar(100) |

Show Columns from Table

Query

presto:tutorials> show columns from mysql.tutorials.author;

Result

 Column    |     Type     | Comment
-----------+--------------+---------
 auth_id   | integer      |
 auth_name | varchar(50)  |
 topic     | varchar(100) |

Access Table Records

要从 mysql 表获取所有记录,请发出以下查询。

To fetch all the records from mysql table, issue the following query.

Query

presto:tutorials> select * from mysql.tutorials.author;

Result

auth_id  |   auth_name    | topic
---------+----------------+--------
       1 | Doug Cutting   | Hadoop
       2 | James Gosling  | java
       3 | Dennis Ritchie | C

从此结果中,您可以在 Presto 中检索 mysql 服务器记录。

From this result, you can retrieve mysql server records in Presto.

Create Table Using as Command

Mysql 连接器不支持创建表查询,但可以使用 as 命令创建表。

Mysql connector doesn’t support create table query but you can create a table using as command.

Query

presto:tutorials> create table mysql.tutorials.sample as
select * from mysql.tutorials.author;

Result

CREATE TABLE: 3 rows

您无法直接插入行,因为此连接器有一些限制。它不支持以下查询:

You can’t insert rows directly because this connector has some limitations. It cannot support the following queries −

  1. create

  2. insert

  3. update

  4. delete

  5. drop

要查看新创建表中的记录,请键入以下查询。

To view the records in the newly created table, type the following query.

Query

presto:tutorials> select * from mysql.tutorials.sample;

Result

auth_id  |   auth_name    | topic
---------+----------------+--------
       1 | Doug Cutting   | Hadoop
       2 | James Gosling  | java
       3 | Dennis Ritchie | C

Apache Presto - JMX Connector

Java Management Extensions (JMX) 提供有关 Java 虚拟机和在 JVM 内部运行的软件的信息。该 JMX 连接器用于在 Presto 服务器中查询 JMX 信息。

Java Management Extensions (JMX) gives information about the Java Virtual Machine and software running inside JVM. The JMX connector is used to query JMX information in Presto server.

由于我们已经启用了 “jmx.properties” 目录下的 “etc/catalog” 文件。现连接 Prest CLI 来启用 JMX 插件。

As we have already enabled “jmx.properties” file under “etc/catalog” directory. Now connect Prest CLI to enable JMX plugin.

Presto CLI

Query

$ ./presto --server localhost:8080 --catalog jmx --schema jmx

Result

您将收到以下应答。

You will receive the following response.

presto:jmx>

JMX Schema

要列出 “jmx” 中的所有架构,请键入以下查询。

To list out all the schemas in “jmx”, type the following query.

Query

presto:jmx> show schemas from jmx;

Result

      Schema
--------------------
 information_schema
 current

Show Tables

要查看 “current” 架构中的表,请使用以下命令。

To view the tables in the “current” schema, use the following command.

Query 1

presto:jmx> show tables from jmx.current;

Result

                                    Table
------------------------------------------------------------------------------
 com.facebook.presto.execution.scheduler:name = nodescheduler
 com.facebook.presto.execution:name = queryexecution
 com.facebook.presto.execution:name = querymanager
 com.facebook.presto.execution:name = remotetaskfactory
 com.facebook.presto.execution:name = taskexecutor
 com.facebook.presto.execution:name = taskmanager
 com.facebook.presto.execution:type = queryqueue,name = global,expansion = global
 ………………
 ……………….

Query 2

presto:jmx> select * from jmx.current.”java.lang:type = compilation";

Result

node               | compilationtimemonitoringsupported |      name   |         objectname         | totalcompilationti
--------------------------------------+------------------------------------+--------------------------------+----------------------------+-------------------
ffffffff-ffff-ffff-ffff-ffffffffffff | true | HotSpot 64-Bit Tiered Compilers | java.lang:type=Compilation |       1276

Query 3

presto:jmx> select * from jmx.current."com.facebook.presto.server:name = taskresource";

Result

 node                 | readfromoutputbuffertime.alltime.count
 | readfromoutputbuffertime.alltime.max | readfromoutputbuffertime.alltime.maxer
 --------------------------------------+---------------------------------------+--------------------------------------+---------------------------------------
 ffffffff-ffff-ffff-ffff-ffffffffffff |                                   92.0 |                          1.009106149 |

Apache Presto - HIVE Connector

Hive 连接器允许查询存储在 Hive 数据仓库中的数据。

The Hive connector allows querying data stored in a Hive data warehouse.

Prerequisites

  1. Hadoop

  2. Hive

希望你在你的机器上安装了 Hadoop 和 Hive。在新终端中逐个启动所有服务。然后,使用以下命令启动 hive 元数据存储,

Hopefully you have installed Hadoop and Hive on your machine. Start all the services one by one in the new terminal. Then, start hive metastore using the following command,

hive --service metastore

Presto 使用 Hive 元数据存储服务来获取 Hive 表的详细信息。

Presto uses Hive metastore service to get the hive table’s details.

Configuration Settings

“etc/catalog” 目录下创建一个文件 “hive.properties” 。使用以下命令。

Create a file “hive.properties” under “etc/catalog” directory. Use the following command.

$ cd etc
$ cd catalog
$ vi hive.properties

connector.name = hive-cdh4
hive.metastore.uri = thrift://localhost:9083

完成所有更改后,保存文件并退出终端。

After making all the changes, save the file and quit the terminal.

Create Database

使用以下查询在 Hive 中创建数据库:

Create a database in Hive using the following query −

Query

hive> CREATE SCHEMA tutorials;

创建数据库后,你可以使用 “show databases” 命令进行验证。

After the database is created, you can verify it using the “show databases” command.

Create Table

创建表是用于在 Hive 中创建表的语句。例如,使用以下查询。

Create Table is a statement used to create a table in Hive. For example, use the following query.

hive> create table author(auth_id int, auth_name varchar(50),
topic varchar(100) STORED AS SEQUENCEFILE;

Insert Table

以下查询可用于向 hive 表中插入记录。

Following query is used to insert records in hive’s table.

hive> insert into table author values (1,’ Doug Cutting’,Hadoop),
(2,’ James Gosling’,java),(3,’ Dennis Ritchie’,C);

Start Presto CLI

你可以启动 Presto CLI 以使用以下命令连接 Hive 存储插件。

You can start Presto CLI to connect Hive storage plugin using the following command.

$ ./presto --server localhost:8080 --catalog hive —schema tutorials;

您将收到以下应答。

You will receive the following response.

presto:tutorials >

List Schemas

要列出 Hive 连接器中的所有模式,请键入以下命令。

To list out all the schemas in Hive connector, type the following command.

Query

presto:tutorials > show schemas from hive;

Result

default

tutorials

List Tables

要列出“教程”模式中的所有表,请使用以下查询。

To list out all the tables in “tutorials” schema, use the following query.

Query

presto:tutorials > show tables from hive.tutorials;

Result

author

Fetch Table

以下查询用于从 hive 表中抓取所有记录。

Following query is used to fetch all the records from hive’s table.

Query

presto:tutorials > select * from hive.tutorials.author;

Result

auth_id  |   auth_name    | topic
---------+----------------+--------
       1 | Doug Cutting   | Hadoop
       2 | James Gosling  | java
       3 | Dennis Ritchie | C

Apache Presto - KAFKA Connector

Presto 的 Kafka 连接器允许使用 Presto 访问 Apache Kafka 中的数据。

The Kafka Connector for Presto allows to access data from Apache Kafka using Presto.

Prerequisites

下载并安装最新版本的以下 Apache 项目。

Download and install the latest version of the following Apache projects.

  1. Apache ZooKeeper

  2. Apache Kafka

Start ZooKeeper

使用以下命令启动 ZooKeeper 服务器。

Start ZooKeeper server using the following command.

$ bin/zookeeper-server-start.sh config/zookeeper.properties

现在,ZooKeeper 在端口 2181 上启动。

Now, ZooKeeper starts port on 2181.

Start Kafka

在另一个终端中使用以下命令启动 Kafka。

Start Kafka in another terminal using the following command.

$ bin/kafka-server-start.sh config/server.properties

Kafka 启动后,它使用端口号 9092。

After kafka starts, it uses the port number 9092.

TPCH Data

Download tpch-kafka

$  curl -o kafka-tpch
https://repo1.maven.org/maven2/de/softwareforge/kafka_tpch_0811/1.0/kafka_tpch_
0811-1.0.sh

现在你已经使用上述命令从 Maven 中心下载了加载器。你将获得如下的类似响应。

Now you have downloaded the loader from Maven central using the above command. You will get a similar response as the following.

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  5 21.6M    5 1279k    0     0  83898      0  0:04:30  0:00:15  0:04:15  129k
  6 21.6M    6 1407k    0     0  86656      0  0:04:21  0:00:16  0:04:05  131k
 24 21.6M   24 5439k    0     0   124k      0  0:02:57  0:00:43  0:02:14  175k
 24 21.6M   24 5439k    0     0   124k      0  0:02:58  0:00:43  0:02:15  160k
 25 21.6M   25 5736k    0     0   128k      0  0:02:52  0:00:44  0:02:08  181k
 ………………………..

然后,使用以下命令使其可执行,

Then, make it executable using the following command,

$ chmod 755 kafka-tpch

Run tpch-kafka

使用以下命令运行 kafka-tpch 程序来预加载许多带有 tpch 数据的主题。

Run the kafka-tpch program to preload a number of topics with tpch data using the following command.

Query

$ ./kafka-tpch load --brokers localhost:9092 --prefix tpch. --tpch-type tiny

Result

2016-07-13T16:15:52.083+0530 INFO main io.airlift.log.Logging Logging
to stderr
2016-07-13T16:15:52.124+0530 INFO main de.softwareforge.kafka.LoadCommand
Processing tables: [customer, orders, lineitem, part, partsupp, supplier,
nation, region]
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-1
de.softwareforge.kafka.LoadCommand Loading table 'customer' into topic 'tpch.customer'...
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-2
de.softwareforge.kafka.LoadCommand Loading table 'orders' into topic 'tpch.orders'...
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-3
de.softwareforge.kafka.LoadCommand Loading table 'lineitem' into topic 'tpch.lineitem'...
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-4
de.softwareforge.kafka.LoadCommand Loading table 'part' into topic 'tpch.part'...
………………………
……………………….

现在,使用 tpch 加载了 Kafka 表 customers、orders、supplier 等。

Now, Kafka tables customers,orders,supplier, etc., are loaded using tpch.

Add Config Settings

我们来在 Presto 服务器上添加以下 Kafka 连接器配置设置。

Let’s add the following Kafka connector configuration settings on Presto server.

connector.name = kafka

kafka.nodes = localhost:9092

kafka.table-names = tpch.customer,tpch.orders,tpch.lineitem,tpch.part,tpch.partsupp,
tpch.supplier,tpch.nation,tpch.region

kafka.hide-internal-columns = false

在上述配置中,使用 Kafka-tpch 程序加载 Kafka 表。

In the above configuration, Kafka tables are loaded using Kafka-tpch program.

Start Presto CLI

使用以下命令启动 Presto CLI,

Start Presto CLI using the following command,

$ ./presto --server localhost:8080 --catalog kafka —schema tpch;

这里 “tpch" 是 Kafka 连接器的模式,你将收到如下的响应。

Here “tpch" is a schema for Kafka connector and you will receive a response as the following.

presto:tpch>

List Tables

以下查询列出了 “tpch” 模式中的所有表。

Following query lists out all the tables in “tpch” schema.

Query

presto:tpch> show tables;

Result

  Table
----------
 customer
 lineitem
 nation
 orders
 part
 partsupp
 region
 supplier

Describe Customer Table

以下查询描述了 “customer” 表。

Following query describes “customer” table.

Query

presto:tpch> describe customer;

Result

  Column           |  Type   |                   Comment
-------------------+---------+---------------------------------------------
 _partition_id     | bigint  | Partition Id
 _partition_offset | bigint  | Offset for the message within the partition
 _segment_start    | bigint  | Segment start offset
 _segment_end      | bigint  | Segment end offset
 _segment_count    | bigint  | Running message count per segment
 _key              | varchar | Key text
 _key_corrupt      | boolean | Key data is corrupt
 _key_length       | bigint  | Total number of key bytes
 _message          | varchar | Message text
 _message_corrupt  | boolean | Message data is corrupt
 _message_length   | bigint  | Total number of message bytes

Apache Presto - JDBC Interface

Presto 的 JDBC 接口用于访问 Java 应用程序。

Presto’s JDBC interface is used to access Java application.

Prerequisites

安装 presto-jdbc-0.150.jar

Install presto-jdbc-0.150.jar

你可以通过访问以下链接下载 JDBC jar 文件,

You can download the JDBC jar file by visiting the following link,

下载 jar 文件后,将其添加到 Java 应用程序的类路径中。

After the jar file has been downloaded, add it to the class path of your Java application.

Create a Simple Application

让我们使用 JDBC 接口创建一个简单的 java 应用程序。

Let’s create a simple java application using JDBC interface.

编码 - PrestoJdbcSample.java

Coding − PrestoJdbcSample.java

import java.sql.*;
import com.facebook.presto.jdbc.PrestoDriver;

//import presto jdbc driver packages here.
public class PrestoJdbcSample {
   public static void main(String[] args) {
      Connection connection = null;
      Statement statement = null;
      try {

         Class.forName("com.facebook.presto.jdbc.PrestoDriver");
         connection = DriverManager.getConnection(
         "jdbc:presto://localhost:8080/mysql/tutorials", "tutorials", “");

         //connect mysql server tutorials database here
         statement = connection.createStatement();
         String sql;
         sql = "select auth_id, auth_name from mysql.tutorials.author”;

         //select mysql table author table two columns
         ResultSet resultSet = statement.executeQuery(sql);
         while(resultSet.next()){
            int id  = resultSet.getInt("auth_id");
            String name = resultSet.getString(“auth_name");
            System.out.print("ID: " + id + ";\nName: " + name + "\n");
         }

         resultSet.close();
         statement.close();
         connection.close();

      }catch(SQLException sqlException){
         sqlException.printStackTrace();
      }catch(Exception exception){
         exception.printStackTrace();
      }
   }
}

保存文件并退出应用程序。现在,在一个终端中启动 Presto 服务器,并打开一个新终端来编译和执行结果。以下是步骤:

Save the file and quit the application. Now, start Presto server in one terminal and open a new terminal to compile and execute the result. Following are the steps −

Compilation

~/Workspace/presto/presto-jdbc $ javac -cp presto-jdbc-0.149.jar  PrestoJdbcSample.java

Execution

~/Workspace/presto/presto-jdbc $ java -cp .:presto-jdbc-0.149.jar  PrestoJdbcSample

Output

INFO: Logging initialized @146ms
ID: 1;
Name: Doug Cutting
ID: 2;
Name: James Gosling
ID: 3;
Name: Dennis Ritchie

Apache Presto - Custom Function Application

创建一个 Maven 项目来开发 Presto 自定义函数。

Create a Maven project to develop Presto custom function.

SimpleFunctionsFactory.java

创建 SimpleFunctionsFactory 类来实现 FunctionFactory 接口。

Create SimpleFunctionsFactory class to implement FunctionFactory interface.

package com.tutorialspoint.simple.functions;

import com.facebook.presto.metadata.FunctionFactory;
import com.facebook.presto.metadata.FunctionListBuilder;
import com.facebook.presto.metadata.SqlFunction;
import com.facebook.presto.spi.type.TypeManager;
import java.util.List;

public class SimpleFunctionFactory implements FunctionFactory {

   private final TypeManager typeManager;
   public SimpleFunctionFactory(TypeManager typeManager) {
      this.typeManager = typeManager;
   }
    @Override

   public List<SqlFunction> listFunctions() {
      return new FunctionListBuilder(typeManager)
      .scalar(SimpleFunctions.class)
      .getFunctions();
   }
}

SimpleFunctionsPlugin.java

创建一个 SimpleFunctionsPlugin 类来实现 Plugin 接口。

Create a SimpleFunctionsPlugin class to implement Plugin interface.

package com.tutorialspoint.simple.functions;

import com.facebook.presto.metadata.FunctionFactory;
import com.facebook.presto.spi.Plugin;
import com.facebook.presto.spi.type.TypeManager;
import com.google.common.collect.ImmutableList;
import javax.inject.Inject;
import java.util.List;
import static java.util.Objects.requireNonNull;

public class SimpleFunctionsPlugin implements Plugin {
   private TypeManager typeManager;
   @Inject

   public void setTypeManager(TypeManager typeManager) {
      this.typeManager = requireNonNull(typeManager, "typeManager is null”);
      //Inject TypeManager class here
   }
   @Override

   public <T> List<T> getServices(Class<T> type){
      if (type == FunctionFactory.class) {
         return ImmutableList.of(type.cast(new SimpleFunctionFactory(typeManager)));
      }
      return ImmutableList.of();
   }
}

Add Resource File

创建一个在实现包中指定的资源文件。

Create a resource file which is specified in the implementation package.

(com.tutorialspoint.simple.functions.SimpleFunctionsPlugin)

然后移动到资源文件所在的位置 @ /path/to/resource/

Now move to the resource file location @ /path/to/resource/

然后添加这些更改,

Then add the changes,

com.facebook.presto.spi.Plugin

pom.xml

将以下这些依赖项添加到 pom.xml 文件。

Add the following dependencies to pom.xml file.

<?xml version = "1.0"?>
<project xmlns = "http://maven.apache.org/POM/4.0.0"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0
    http://maven.apache.org/xsd/maven-4.0.0.xsd">

   <modelVersion>4.0.0</modelVersion>
   <groupId>com.tutorialspoint.simple.functions</groupId>
   <artifactId>presto-simple-functions</artifactId>
   <packaging>jar</packaging>
   <version>1.0</version>
   <name>presto-simple-functions</name>
   <description>Simple test functions for Presto</description>
   <properties>
      <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
   </properties>
   <dependencies>
      <dependency>
         <groupId>com.facebook.presto</groupId>
         <artifactId>presto-spi</artifactId>
         <version>0.149</version>
      </dependency>
      <dependency>
         <groupId>com.facebook.presto</groupId>
         <artifactId>presto-main</artifactId>
         <version>0.149</version>
      </dependency>
      <dependency>
         <groupId>javax.inject</groupId>
         <artifactId>javax.inject</artifactId>
         <version>1</version>
      </dependency>
      <dependency>
         <groupId>com.google.guava</groupId>
         <artifactId>guava</artifactId>
         <version>19.0</version>
      </dependency>
   </dependencies>
   <build>
      <finalName>presto-simple-functions</finalName>
      <plugins>
      <!-- Make this jar executable -->
         <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-jar-plugin</artifactId>
            <version>2.3.2</version>
         </plugin>
      </plugins>
   </build>
</project>

SimpleFunctions.java

使用 Presto 属性创建一个 SimpleFunctions 类。

Create SimpleFunctions class using Presto attributes.

package com.tutorialspoint.simple.functions;

import com.facebook.presto.operator.Description;
import com.facebook.presto.operator.scalar.ScalarFunction;
import com.facebook.presto.operator.scalar.StringFunctions;
import com.facebook.presto.spi.type.StandardTypes;
import com.facebook.presto.type.LiteralParameters;
import com.facebook.presto.type.SqlType;

public final class SimpleFunctions {
   private SimpleFunctions() {
   }

   @Description("Returns summation of two numbers")
   @ScalarFunction(“mysum")
   //function name
   @SqlType(StandardTypes.BIGINT)

   public static long sum(@SqlType(StandardTypes.BIGINT) long num1,
   @SqlType(StandardTypes.BIGINT) long num2) {
      return num1 + num2;
   }
}

在创建应用程序之后编译并执行该应用程序。它将生成 JAR 文件。复制该文件并将 JAR 文件移动到目标 Presto 服务器插件目录中。

After the application is created compile and execute the application. It will produce the JAR file. Copy the file and move the JAR file into the target Presto server plugin directory.

Compilation

mvn compile

Execution

mvn package

现重启 Presto 服务器并连接 Presto 客户端。然后按照以下说明执行自定义函数应用程序,

Now restart Presto server and connect Presto client. Then execute the custom function application as explained below,

$ ./presto --catalog mysql --schema default

Query

presto:default> select mysum(10,10);

Result

 _col0
-------
  20