Apache Presto 简明教程

Apache Presto - Architecture

Presto 的架构几乎类似于经典 MPP(海量并行处理)DBMS 架构。下图说明了 Presto 的架构。

The architecture of Presto is almost similar to classic MPP (massively parallel processing) DBMS architecture. The following diagram illustrates the architecture of Presto.

presto architecture

上述图表包含不同的组件。下表详细描述了每个组件。

The above diagram consists of different components. Following table describes each of the component in detail.

S.No

Component & Description

1.

Client Client (Presto CLI) submits SQL statements to a coordinator to get the result.

2.

Coordinator Coordinator is a master daemon. The coordinator initially parses the SQL queries then analyzes and plans for the query execution. Scheduler performs pipeline execution, assigns work to the closest node and monitors progress.

3.

Connector Storage plugins are called as connectors. Hive, HBase, MySQL, Cassandra and many more act as a connector; otherwise you can also implement a custom one. The connector provides metadata and data for queries. The coordinator uses the connector to get metadata for building a query plan.

4.

Worker The coordinator assigns task to worker nodes. The workers get actual data from the connector. Finally, the worker node delivers result to the client.

Presto − Workflow

Presto 是一个分布式系统,在集群的节点上运行。Presto 的分布式查询引擎经过优化,适用于交互分析,并支持标准 ANSI SQL,包括复杂查询、聚合、连接和窗口函数。Presto 架构简单而可扩展。Presto 客户端(CLI)向一个主守护进程协调器提交 SQL 语句。

Presto is a distributed system that runs on a cluster of nodes. Presto’s distributed query engine is optimized for interactive analysis and supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions. Presto architecture is simple and extensible. Presto client (CLI) submits SQL statements to a master daemon coordinator.

调度器通过执行管道连接。调度器会将工作分配给最靠近数据且可以监控进度的节点。协调器将任务分配给多个工作节点,最终工作节点将结果返回给客户端。客户端从输出进程获取数据。可扩展性是一项主要的设计要点。像 Hive、HBase、MySQL 等可插入连接器可为查询提供元数据和数据。Presto 采用“简单存储抽象”进行设计,使针对不同类型数据源提供 SQL 查询功能变得容易。

The scheduler connects through execution pipeline. The scheduler assigns work to nodes which is closest to the data and monitors progress. The coordinator assigns task to multiple worker nodes and finally the worker node delivers the result back to the client. The client pulls data from the output process. Extensibility is the key design. Pluggable connectors like Hive, HBase, MySQL, etc., provides metadata and data for queries. Presto was designed with a “simple storage abstraction” that makes it easy to provide SQL query capability against these different kind of data sources.

Execution Model

Presto 利用专为支持 SQL 语义而设计的操作员,支持自定义查询和执行引擎。除了改进调度外,所有处理工作都在内存中执行,并在不同阶段通过网络进行管道传输。这避免了不必要的 I/O 延迟开销。

Presto supports custom query and execution engine with operators designed to support SQL semantics. In addition to improved scheduling, all processing is in memory and pipelined across the network between different stages. This avoids unnecessary I/O latency overhead.