Apache Storm 简明教程

Apache Storm - Cluster Architecture

Apache Storm 的主要亮点之一在于它是一种容错的、没有“单点故障”(SPOF) 的快速分布式应用程序。我们可以根据需要在尽可能多的系统中安装 Apache Storm,以增加应用程序的容量。

One of the main highlight of the Apache Storm is that it is a fault-tolerant, fast with no “Single Point of Failure” (SPOF) distributed application. We can install Apache Storm in as many systems as needed to increase the capacity of the application.

让我们看看 Apache Storm 集群的设计和内部架构是如何的。下图显示了集群设计。

Let’s have a look at how the Apache Storm cluster is designed and its internal architecture. The following diagram depicts the cluster design.

zookeeper framework

Apache Storm 有两种类型的节点, Nimbus (主节点)和 Supervisor (工作节点)。Nimbus 是 Apache Storm 的核心组件。Nimbus 的主要工作是运行 Storm 拓扑。Nimbus 分析拓扑并收集要执行的任务。然后,它会将任务分配给可用的 supervisior。

Apache Storm has two type of nodes, Nimbus (master node) and Supervisor (worker node). Nimbus is the central component of Apache Storm. The main job of Nimbus is to run the Storm topology. Nimbus analyzes the topology and gathers the task to be executed. Then, it will distributes the task to an available supervisor.

一个 supervisor 将拥有一个或多个工作进程。supervisior 会将任务委托给工作进程。工作进程将根据需要生成尽可能多的执行程序并运行任务。Apache Storm 使用内部分布式消息系统在 nimbus 和 supervisior 之间进行通信。

A supervisor will have one or more worker process. Supervisor will delegate the tasks to worker processes. Worker process will spawn as many executors as needed and run the task. Apache Storm uses an internal distributed messaging system for the communication between nimbus and supervisors.

Components

Description

Nimbus

Nimbus is a master node of Storm cluster. All other nodes in the cluster are called as worker nodes. Master node is responsible for distributing data among all the worker nodes, assign tasks to worker nodes and monitoring failures.

Supervisor

The nodes that follow instructions given by the nimbus are called as Supervisors. A supervisor has multiple worker processes and it governs worker processes to complete the tasks assigned by the nimbus.

Worker process

A worker process will execute tasks related to a specific topology. A worker process will not run a task by itself, instead it creates executors and asks them to perform a particular task. A worker process will have multiple executors.

Executor

An executor is nothing but a single thread spawn by a worker process. An executor runs one or more tasks but only for a specific spout or bolt.

Task

A task performs actual data processing. So, it is either a spout or a bolt.

ZooKeeper framework

Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate between themselves and maintaining shared data with robust synchronization techniques. Nimbus is stateless, so it depends on ZooKeeper to monitor the working node status. ZooKeeper helps the supervisor to interact with the nimbus. It is responsible to maintain the state of nimbus and supervisor.

Storm 本质上是无状态的。即使无状态的本质有其自身的缺点,但它实际上帮助 Storm 以尽可能最佳、最快速的方式处理实时数据。

Storm is stateless in nature. Even though stateless nature has its own disadvantages, it actually helps Storm to process real-time data in the best possible and quickest way.

不过,Storm 并不完全是无状态的。它将自己的状态存储在 Apache ZooKeeper 中。由于状态在 Apache ZooKeeper 中可用,因此一个失败的 nimbus 可以重新启动并从它停止的位置继续工作。通常,类似于 monit 的服务监控工具将监控 Nimbus,并在出现任何故障时重新启动它。

Storm is not entirely stateless though. It stores its state in Apache ZooKeeper. Since the state is available in Apache ZooKeeper, a failed nimbus can be restarted and made to work from where it left. Usually, service monitoring tools like monit will monitor Nimbus and restart it if there is any failure.

Apache Storm 还具有一种称为 Trident Topology 的高级拓扑,具有状态维护,并且还提供类似于 Pig 的高级别 API。我们将在接下来的章节讨论所有这些功能。

Apache Storm also have an advanced topology called Trident Topology with state maintenance and it also provides a high-level API like Pig. We will discuss all these features in the coming chapters.