Apache Storm 简明教程

Storm - Distributed Messaging System

Apache Storm 处理实时数据,而输入通常来自消息排队系统。外部分布式消息系统将为实时计算提供必要的输入。流经器将从消息系统读取数据,并将它转换为元组,并输入至 Apache Storm。有趣的是,Apache Storm 在其内部使用自己的分布式消息系统,来进行其 Nimbus 和 Supervisor 之间的通信。

Apache Storm processes real-time data and the input normally comes from a message queuing system. An external distributed messaging system will provide the input necessary for the realtime computation. Spout will read the data from the messaging system and convert it into tuples and input into the Apache Storm. The interesting fact is that Apache Storm uses its own distributed messaging system internally for the communication between its nimbus and supervisor.

What is Distributed Messaging System?

分布式消息基于可靠消息排队的概念。消息在客户端应用程序和消息系统之间异步排队。分布式消息系统提供了可靠性、可伸缩性和持久性的优势。

Distributed messaging is based on the concept of reliable message queuing. Messages are queued asynchronously between client applications and messaging systems. A distributed messaging system provides the benefits of reliability, scalability, and persistence.

大多数消息模式都遵循 publish-subscribe 模型(即 Pub-Sub ),其中消息的发送者被称为 publishers ,而希望接收消息者被称为 subscribers

Most of the messaging patterns follow the publish-subscribe model (simply Pub-Sub) where the senders of the messages are called publishers and those who want to receive the messages are called subscribers.

在发送者发布消息后,订阅者借助筛选选项可以接收选定的消息。通常情况下,我们有两种类型的筛选,一种是 topic-based filtering ,另一种是 content-based filtering

Once the message has been published by the sender, the subscribers can receive the selected message with the help of a filtering option. Usually we have two types of filtering, one is topic-based filtering and another one is content-based filtering.

请注意,发布-订阅模式只能通过消息进行通信。这是一种非常松散耦合的架构;即使是发送者也不了解其订阅者是谁。许多消息模式通过消息代理实现,以供多个订阅者及时访问发布的消息。现实生活中的一个例子是 Dish TV,它发布不同的频道,如运动频道、电影频道、音乐频道等,而任何人都可以订阅他们自己的频道集,并在其订阅的频道可用时获取它们。

Note that the pub-sub model can communicate only via messages. It is a very loosely coupled architecture; even the senders don’t know who their subscribers are. Many of the message patterns enable with message broker to exchange publish messages for timely access by many subscribers. A real-life example is Dish TV, which publishes different channels like sports, movies, music, etc., and anyone can subscribe to their own set of channels and get them whenever their subscribed channels are available.

messaging system

下表介绍了一些常见的超高吞吐消息系统 −

The following table describes some of the popular high throughput messaging systems −

Distributed messaging system

Description

Apache Kafka

Kafka was developed at LinkedIn corporation and later it became a sub-project of Apache. Apache Kafka is based on brokerenabled, persistent, distributed publish-subscribe model. Kafka is fast, scalable, and highly efficient.

RabbitMQ

RabbitMQ is an open source distributed robust messaging application. It is easy to use and runs on all platforms.

JMS(Java Message Service)

JMS is an open source API that supports creating, reading, and sending messages from one application to another. It provides guaranteed message delivery and follows publish-subscribe model.

ActiveMQ

ActiveMQ messaging system is an open source API of JMS.

ZeroMQ

ZeroMQ is broker-less peer-peer message processing. It provides push-pull, router-dealer message patterns.

Kestrel

Kestrel is a fast, reliable, and simple distributed message queue.

Thrift Protocol

Thrift 是 Facebook 为跨语言服务开发和远程过程调用 (RPC) 而构建的。后来,它成为一个开源 Apache 项目。Apache Thrift 是一个 Interface Definition Language ,允许在已定义的数据类型之上以一种简单的方式定义新的数据类型和服务实现。

Thrift was built at Facebook for cross-language services development and remote procedure call (RPC). Later, it became an open source Apache project. Apache Thrift is an Interface Definition Language and allows to define new data types and services implementation on top of the defined data types in an easy manner.

Apache Thrift 也是一个通信框架,它支持嵌入式系统、移动应用程序、Web 应用程序和许多其他编程语言。与 Apache Thrift 相关的一些关键功能是它的模块化、灵活性以及高性能。此外,它还可以在分布式应用程序中执行流式传输、消息传递和 RPC。

Apache Thrift is also a communication framework that supports embedded systems, mobile applications, web applications, and many other programming languages. Some of the key features associated with Apache Thrift are its modularity, flexibility, and high performance. In addition, it can perform streaming, messaging, and RPC in distributed applications.

Storm 广泛使用 Thrift 协议进行其内部通信和数据定义。Storm 拓扑只是 Thrift Structs 。Apache Storm 中运行拓扑的 Storm Nimbus 是一个 Thrift service

Storm extensively uses Thrift Protocol for its internal communication and data definition. Storm topology is simply Thrift Structs. Storm Nimbus that runs the topology in Apache Storm is a Thrift service.