Logstash 简明教程

Logstash - Internal Architecture

在本章中，我们将讨论有关 Logstash 的内部架构和不同组件的内容。

In this chapter, we will discuss regarding the internal architecture and the different components of Logstash.

Logstash Service Architecture

Logstash 处理来自不同服务器和数据源日志，并作为发送器运行。发送器用于收集日志，并将其安装在每个输入源中。 Redis, Kafka 或 RabbitMQ 等代理是用于为索引器保存数据的缓冲区，可能有多个代理作为故障转移实例。

Logstash processes logs from different servers and data sources and it behaves as the shipper. The shippers are used to collect the logs and these are installed in every input source. Brokers like Redis, Kafka or RabbitMQ are buffers to hold the data for indexers, there may be more than one brokers as failed over instances.

Lucene 等索引器用于对日志进行索引，以提高搜索性能，然后将输出存储在 Elasticsearch 或其他输出目标中。输出存储中的数据可供 Kibana 和其他可视化软件使用。

Indexers like Lucene are used to index the logs for better search performance and then the output is stored in Elasticsearch or other output destination. The data in output storage is available for Kibana and other visualization software.

Logstash Internal Architecture

Logstash 管道由三个组件组成 Input, Filters 和 Output 。输入部分负责指定和访问输入数据源，例如 Apache Tomcat Server 的日志文件夹。

The Logstash pipeline consists of three components Input, Filters and Output. The input part is responsible to specify and access the input data source such as the log folder of the Apache Tomcat Server.

Example to Explain the Logstash Pipeline

Logstash 配置文件包含有关 Logstash 三个组件的详细信息。在这种情况下，我们将创建一个名为 Logstash.conf 的文件。

The Logstash configuration file contains the details about the three components of Logstash. In this case, we are creating a file name called Logstash.conf.

以下配置从输入日志 “inlog.log” 中捕获数据，并将其写入输出日志 “outlog.log”，没有任何筛选器。

The following configuration captures data from an input log “inlog.log” and writes it to an output log “outlog.log” without any filters.

Logstash.conf

Logstash 配置文件仅使用输入插件从 inlog.log 文件复制数据，并使用输出插件将日志数据刷新到 outlog.log 文件。

The Logstash configuration file just copies the data from the inlog.log file using the input plugin and flushes the log data to outlog.log file using the output plugin.

input {
   file {
      path => "C:/tpwork/logstash/bin/log/inlog.log"
   }
}
output {
   file {
      path => "C:/tpwork/logstash/bin/log/outlog.log"
   }
}

Run Logstash

Logstash 使用 –f 选项指定配置文件。

Logstash uses –f option to specify the config file.

C:\logstash\bin> logstash –f logstash.conf

inlog.log

以下代码块显示了输入日志数据。

The following code block shows the input log data.

Hello tutorialspoint.com

outlog.log

Logstash 输出包含消息字段中的输入数据。Logstash 还向输出添加其他字段，例如时间戳、输入源的路径、版本、主机和标记。

The Logstash output contains the input data in message field. Logstash also adds other fields to the output like Timestamp, Path of the Input Source, Version, Host and Tags.

{
   "path":"C:/tpwork/logstash/bin/log/inlog1.log",
   "@timestamp":"2016-12-13T02:28:38.763Z",
   "@version":"1", "host":"Dell-PC",
   "message":" Hello tutorialspoint.com", "tags":[]
}

正如您所见，Logstash 的输出包含了输入日志提供的数据之外的内容。输出包含源路径、时间戳、版本、主机名和标记，这些用于表示额外的消息，例如错误。

As you can, the output of Logstash contains more than the data supplied through the input log. The output contains the Source Path, Timestamp, Version, Hostname and Tag, which are used to represent the extra messages like errors.

我们可以使用筛选器处理数据，并使其对我们的需求有用。在下一个示例中，我们使用筛选器获取数据，该数据将输出限制为仅包含动词（例如 GET 或 POST）后面跟 Unique Resource Identifier 的数据。

We can use filters to process the data and make its useful for our needs. In the next example, we are using filter to get the data, which restricts the output to only data with a verb like GET or POST followed by a Unique Resource Identifier.

Logstash.conf

在此 Logstash 配置中，我们添加了一个名为 grok 的筛选器来筛选输入数据。与模式序列输入日志匹配的输入日志事件仅会进入具有错误的输出目标。Logstash 在不匹配 grok 筛选器模式序列的输出事件中添加了一个名为 "_grokparsefailure" 的标记。

In this Logstash configuration, we add a filter named grok to filter out the input data. The input log event, which matches the pattern sequence input log, only get to the output destination with error. Logstash adds a tag named "_grokparsefailure" in the output events, which does not match the grok filter pattern sequence.

Logstash 为解析流行的服务器日志（如 Apache）提供了许多内置 regex 模式。这里使用的模式要求动词（例如 get、post 等），后面跟一个统一资源标识符。

Logstash offers many inbuilt regex patterns for parsing popular server logs like Apache. The pattern used here expects a verb like get, post, etc., followed by a uniform resource identifier.

input {
   file {
      path => "C:/tpwork/logstash/bin/log/inlog2.log"
   }
}
filter {
   grok {
      match => {"message" => "%{WORD:verb} %{URIPATHPARAM:uri}"}
   }
}
output {
   file {
      path => "C:/tpwork/logstash/bin/log/outlog2.log"
   }
}

Run Logstash

我们可以使用以下命令运行 Logstash。

We can run Logstash by using the following command.

C:\logstash\bin> logstash –f  Logstash.conf

inlog2.log

我们的输入文件包含两个由默认分隔符（即换行符分隔符）分隔的事件。第一个事件与 GROk 中指定的模式匹配，而第二个事件不匹配。

Our input file contains two events separated by default delimiter, i.e., new line delimiter. The first event matches the pattern specified in GROk and the second one does not.

GET /tutorialspoint/Logstash
Input 1234

outlog2.log

我们可以看到，第二个输出事件包含 "_grokparsefailure" 标记，因为它不匹配 grok 筛选器模式。用户还可以通过在输出插件中使用 ‘if’ 条件来删除输出中的这些不匹配的事件。

We can see that the second output event contains "_grokparsefailure" tag, because it does not match the grok filter pattern. The user can also remove these unmatched events in output by using the ‘if’ condition in the output plugin.

{
   "path":"C:/tpwork/logstash/bin/log/inlog2.log",
   "@timestamp":"2016-12-13T02:47:10.352Z","@version":"1","host":"Dell-PC","verb":"GET",
   "message":"GET /tutorialspoint/logstash", "uri":"/tutorialspoint/logstash", "tags":[]
}
{
   "path":"C:/tpwork/logstash/bin/log/inlog2.log",
   "@timestamp":"2016-12-13T02:48:12.418Z", "@version":"1", "host":"Dell-PC",
   "message":"t 1234\r", "tags":["_grokparsefailure"]
}