Apache Flume 简明教程

Apache Flume - Introduction

What is Flume?

Apache Flume 是一种工具/服务/数据收集机制,用于从各种来源收集、聚集和传输大量流数据(例如日志文件、事件等)到集中式数据存储。

Apache Flume is a tool/service/data ingestion mechanism for collecting aggregating and transporting large amounts of streaming data such as log files, events (etc…​) from various sources to a centralized data store.

Flume 是一款可靠性高、可分布且可配置的工具。它主要设计为将各种 Web 服务器的流数据(日志数据)复制到 HDFS。

Flume is a highly reliable, distributed, and configurable tool. It is principally designed to copy streaming data (log data) from various web servers to HDFS.

apache flume

Applications of Flume

假设一个电子商务 Web 应用程序想要分析特定区域的客户行为。为此,他们需要将可用的日志数据移至 Hadoop 进行分析。这里,Apache Flume 派上用场了。

Assume an e-commerce web application wants to analyze the customer behavior from a particular region. To do so, they would need to move the available log data in to Hadoop for analysis. Here, Apache Flume comes to our rescue.

Flume 用于以更高的速度将应用程序服务器生成日志数据移至 HDFS。

Flume is used to move the log data generated by application servers into HDFS at a higher speed.

Advantages of Flume

以下是使用 Flume 的优点:

Here are the advantages of using Flume −

  1. Using Apache Flume we can store the data in to any of the centralized stores (HBase, HDFS).

  2. When the rate of incoming data exceeds the rate at which data can be written to the destination, Flume acts as a mediator between data producers and the centralized stores and provides a steady flow of data between them.

  3. Flume provides the feature of contextual routing.

  4. The transactions in Flume are channel-based where two transactions (one sender and one receiver) are maintained for each message. It guarantees reliable message delivery.

  5. Flume is reliable, fault tolerant, scalable, manageable, and customizable.

Features of Flume

以下是 Flume 的一些显着特性−

Some of the notable features of Flume are as follows −

  1. Flume ingests log data from multiple web servers into a centralized store (HDFS, HBase) efficiently.

  2. Using Flume, we can get the data from multiple servers immediately into Hadoop.

  3. Along with the log files, Flume is also used to import huge volumes of event data produced by social networking sites like Facebook and Twitter, and e-commerce websites like Amazon and Flipkart.

  4. Flume supports a large set of sources and destinations types.

  5. Flume supports multi-hop flows, fan-in fan-out flows, contextual routing, etc.

  6. Flume can be scaled horizontally.