Apache Presto 简明教程
Apache Presto - KAFKA Connector
Presto 的 Kafka 连接器允许使用 Presto 访问 Apache Kafka 中的数据。
The Kafka Connector for Presto allows to access data from Apache Kafka using Presto.
Prerequisites
下载并安装最新版本的以下 Apache 项目。
Download and install the latest version of the following Apache projects.
-
Apache ZooKeeper
-
Apache Kafka
Start ZooKeeper
使用以下命令启动 ZooKeeper 服务器。
Start ZooKeeper server using the following command.
$ bin/zookeeper-server-start.sh config/zookeeper.properties
现在,ZooKeeper 在端口 2181 上启动。
Now, ZooKeeper starts port on 2181.
Start Kafka
在另一个终端中使用以下命令启动 Kafka。
Start Kafka in another terminal using the following command.
$ bin/kafka-server-start.sh config/server.properties
Kafka 启动后,它使用端口号 9092。
After kafka starts, it uses the port number 9092.
TPCH Data
Download tpch-kafka
$ curl -o kafka-tpch
https://repo1.maven.org/maven2/de/softwareforge/kafka_tpch_0811/1.0/kafka_tpch_
0811-1.0.sh
现在你已经使用上述命令从 Maven 中心下载了加载器。你将获得如下的类似响应。
Now you have downloaded the loader from Maven central using the above command. You will get a similar response as the following.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
5 21.6M 5 1279k 0 0 83898 0 0:04:30 0:00:15 0:04:15 129k
6 21.6M 6 1407k 0 0 86656 0 0:04:21 0:00:16 0:04:05 131k
24 21.6M 24 5439k 0 0 124k 0 0:02:57 0:00:43 0:02:14 175k
24 21.6M 24 5439k 0 0 124k 0 0:02:58 0:00:43 0:02:15 160k
25 21.6M 25 5736k 0 0 128k 0 0:02:52 0:00:44 0:02:08 181k
………………………..
然后,使用以下命令使其可执行,
Then, make it executable using the following command,
$ chmod 755 kafka-tpch
Run tpch-kafka
使用以下命令运行 kafka-tpch 程序来预加载许多带有 tpch 数据的主题。
Run the kafka-tpch program to preload a number of topics with tpch data using the following command.
Result
2016-07-13T16:15:52.083+0530 INFO main io.airlift.log.Logging Logging
to stderr
2016-07-13T16:15:52.124+0530 INFO main de.softwareforge.kafka.LoadCommand
Processing tables: [customer, orders, lineitem, part, partsupp, supplier,
nation, region]
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-1
de.softwareforge.kafka.LoadCommand Loading table 'customer' into topic 'tpch.customer'...
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-2
de.softwareforge.kafka.LoadCommand Loading table 'orders' into topic 'tpch.orders'...
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-3
de.softwareforge.kafka.LoadCommand Loading table 'lineitem' into topic 'tpch.lineitem'...
2016-07-13T16:15:52.834+0530 INFO pool-1-thread-4
de.softwareforge.kafka.LoadCommand Loading table 'part' into topic 'tpch.part'...
………………………
……………………….
现在,使用 tpch 加载了 Kafka 表 customers、orders、supplier 等。
Now, Kafka tables customers,orders,supplier, etc., are loaded using tpch.
Add Config Settings
我们来在 Presto 服务器上添加以下 Kafka 连接器配置设置。
Let’s add the following Kafka connector configuration settings on Presto server.
connector.name = kafka
kafka.nodes = localhost:9092
kafka.table-names = tpch.customer,tpch.orders,tpch.lineitem,tpch.part,tpch.partsupp,
tpch.supplier,tpch.nation,tpch.region
kafka.hide-internal-columns = false
在上述配置中,使用 Kafka-tpch 程序加载 Kafka 表。
In the above configuration, Kafka tables are loaded using Kafka-tpch program.
Start Presto CLI
使用以下命令启动 Presto CLI,
Start Presto CLI using the following command,
$ ./presto --server localhost:8080 --catalog kafka —schema tpch;
这里 “tpch" 是 Kafka 连接器的模式,你将收到如下的响应。
Here “tpch" is a schema for Kafka connector and you will receive a response as the following.
presto:tpch>
Describe Customer Table
以下查询描述了 “customer” 表。
Following query describes “customer” table.
Result
Column | Type | Comment
-------------------+---------+---------------------------------------------
_partition_id | bigint | Partition Id
_partition_offset | bigint | Offset for the message within the partition
_segment_start | bigint | Segment start offset
_segment_end | bigint | Segment end offset
_segment_count | bigint | Running message count per segment
_key | varchar | Key text
_key_corrupt | boolean | Key data is corrupt
_key_length | bigint | Total number of key bytes
_message | varchar | Message text
_message_corrupt | boolean | Message data is corrupt
_message_length | bigint | Total number of message bytes