Apache Flume 简明教程

Apache Flume - Data Flow

Flume 是一个用于将日志数据移动到 HDFS 的框架。通常,事件和日志数据由日志服务器生成,并且这些服务器运行着 Flume 代理。这些代理从数据生成器中接收数据。

Flume is a framework which is used to move log data into HDFS. Generally events and log data are generated by the log servers and these servers have Flume agents running on them. These agents receive the data from the data generators.

通过名为“ Collector ”的中间节点收集这些代理中的数据。和代理一样,Flume 中可以有多个收集器。

The data in these agents will be collected by an intermediate node known as Collector. Just like agents, there can be multiple collectors in Flume.

最后,所有这些收集器中的数据都会被聚合起来并推送到集中存储,例如 HBase 或 HDFS。下图阐明了 Flume 中的数据流。

Finally, the data from all these collectors will be aggregated and pushed to a centralized store such as HBase or HDFS. The following diagram explains the data flow in Flume.

flume dataflow

Multi-hop Flow

在 Flume 中,可以有多个代理,而且在到达最终目的地之前,一个事件可能会通过多个代理。这称为“ multi-hop flow ”。

Within Flume, there can be multiple agents and before reaching the final destination, an event may travel through more than one agent. This is known as multi-hop flow.

Fan-out Flow

从一个源到多个通道的数据流称为“ fan-out flow ”。它有两种类型 −

The dataflow from one source to multiple channels is known as fan-out flow. It is of two types −

  1. Replicating − The data flow where the data will be replicated in all the configured channels.

  2. Multiplexing − The data flow where the data will be sent to a selected channel which is mentioned in the header of the event.

Fan-in Flow

将数据从许多源传输到一个通道的数据流称为“ fan-in flow ”。

The data flow in which the data will be transferred from many sources to one channel is known as fan-in flow.

Failure Handling

在 Flume 中,每个事件都会进行两个事务:一个在发送方,一个在接收方。发送方将事件发送到接收方。在收到数据后,接收方立即提交它自己的事务,并向发送方发送“已收到”信号。在收到信号后,发送方提交自己的事务。(在收到接收方的信号之前,发送方不会提交其事务。)

In Flume, for each event, two transactions take place: one at the sender and one at the receiver. The sender sends events to the receiver. Soon after receiving the data, the receiver commits its own transaction and sends a “received” signal to the sender. After receiving the signal, the sender commits its transaction. (Sender will not commit its transaction till it receives a signal from the receiver.)