Kibana 简明教程

Kibana - Aggregation And Metrics

在学习 Kibana 期间经常遇到的两个术语是存储段和指标聚合。本章将讨论它们在 Kibana 中所扮演的角色以及有关它们的更多详细信息。

The two terms that you come across frequently during your learning of Kibana are Bucket and Metrics Aggregation. This chapter discusses what role they play in Kibana and more details about them.

What is Kibana Aggregation?

聚合指的是某个特定搜索查询或过滤器获得的文档集合或文档集。聚合构成在 Kibana 中构建所需的视化的主要概念。

Aggregation refers to the collection of documents or a set of documents obtained from a particular search query or filter. Aggregation forms the main concept to build the desired visualization in Kibana.

每当执行任何可视化时,你需要确定标准,这意味着你要以什么方式对数据进行分组以对其执行度量。

Whenever you perform any visualization, you need to decide the criteria, which means in which way you want to group the data to perform the metric on it.

在这一部分中,我们将讨论两种类型的聚合 −

In this section, we will discuss two types of Aggregation −

  1. Bucket Aggregation

  2. Metric Aggregation

Bucket Aggregation

存储段主要包含一个键和一个文档。执行聚合时,会将文档放置在相应的存储段中。因此,最终你应该看到存储段列表,每个存储段都包含文档列表。在 Kibana 中创建可视化时你会看到的存储段聚合列表如下所示 −

A bucket mainly consists of a key and a document. When the aggregation is executed, the documents are placed in the respective bucket. So at the end you should have a list of buckets, each with a list of documents. The list of Bucket Aggregation you will see while creating visualization in Kibana is shown below −

bucket aggregation

存储段聚合具有以下列表 −

Bucket Aggregation has the following list −

  1. Date Histogram

  2. Date Range

  3. Filters

  4. Histogram

  5. IPv4 Range

  6. Range

  7. Significant Terms

  8. Terms

在创建时,你需要为存储段聚合确定其中之一,即对存储段中的文档进行分组。

While creating, you need to decide one of them for Bucket Aggregation i.e. to group the documents inside the buckets.

例如,对于分析,考虑我们在本教程开头上传的国家数据。countries 索引中可用的字段有国家名称、面积、人口、区域。在国家数据中,我们有国家名称及其人口、区域和面积。

As an example, for analysis, consider the countries data that we have uploaded at the start of this tutorial. The fields available in the countries index is country name, area, population, region. In the countries data, we have name of the country along with its population, region and the area.

让我们假设我们要按区域划分数据。然后,每个区域中的国家将成为我们的搜索查询,因此在这种情况下,区域将形成我们的存储段。下面的框图显示 R1、R2、R3、R4、R5 和 R6 是我们获得的存储段,而 c1、c2 …​c25 是属于存储段 R1 至 R6 的文档列表。

Let us assume that we want region wise data. Then, the countries available in each region becomes our search query, so in this case the region will form our buckets. The block diagram below shows that R1, R2,R3,R4,R5 and R6 are the buckets which we got and c1 , c2 ..c25 are the list of documents which are part of the buckets R1 to R6.

block diagram aggregation

我们可以看到每个存储段中有一些圆圈。它们是基于搜索标准的文档集,并被视为属于各个存储段的一部分。在存储段 R1 中,我们有文档 c1、c8 和 c15。这些文档是属于该区域的国家,对于其他的存储段而言也是如此。因此,如果我们计算存储段 R1 中的国家数量,则是 3,R2 为 6,R3 为 6,R4 为 2,R5 为 5,R6 为 4。

We can see that there are some circles in each of the bucket. They are set of documents based on the search criteria and considered to be falling in each of the bucket. In the bucket R1, we have documents c1, c8 and c15. These documents are the countries that falling in that region, same for others. So if we count the countries in Bucket R1 it is 3, 6 for R2, 6 for R3, 2 for R4, 5 for R5 and 4 for R6.

因此,通过存储段聚合,我们可以将文档聚合到存储段中,并像上面显示的那样获得该存储段中的文档列表。

So through bucket aggregation, we can aggregate the document in buckets and have a list of documents in those buckets as shown above.

到目前为止,我们所具有的存储段聚合列表有 −

The list of Bucket Aggregation we have so far is −

  1. Date Histogram

  2. Date Range

  3. Filters

  4. Histogram

  5. IPv4 Range

  6. Range

  7. Significant Terms

  8. Terms

现在让我们详细讨论如何逐个形成这些存储段。

Let us now discuss how to form these buckets one by one in detail.

Date Histogram

日期直方图聚合用于日期字段。因此,如果你要用于可视化的索引在该索引中具有日期字段,则只能使用这种聚合类型。这是一个多存储段聚合,这意味着你有一些文档可以作为多个存储段的一部分。需要针对这种聚合使用一个间隔,具体信息如下 −

Date Histogram aggregation is used on a date field. So the index that you use to visualize, if you have date field in that index than only this aggregation type can be used. This is a multi-bucket aggregation which means you can have some of the documents as a part of more than 1 bucket. There is an interval to be used for this aggregation and the details are as shown below −

date histogram

将“Bucket聚合”选择为“日期直方图”时,它将显示“字段”选项,其中仅提供与日期相关的字段。选择字段后,你需要选择具有以下详细信息的“时间间隔”−

When you Select Buckets Aggregation as Date Histogram, it will display the Field option which will give only the date related fields. Once you select your field, you need to select the Interval which has the following details −

select interval histogram

因此,根据所选索引、字段和时间间隔中的文档,将对文档进行分类。例如,如果你选择每月时间间隔,则会将基于日期的文档转换为多个子段,根据月份(即 1 月至 12 月),文档将被放入子段中。在这里,1 月、2 月……12 月将是子段。

So the documents from the index chosen and based on the field and interval chosen will categorize the documents in buckets. For example, if you chose the interval as monthly, the documents based on date will be converted into buckets and based on the month i.e, Jan-Dec the documents will be put in the buckets. Here Jan,Feb,..Dec will be the buckets.

Date Range

你需要一个日期字段才能使用此聚合类型。在这里,我们会有一个日期范围,即从日期到日期。子段将根据给定的日期范围包含文档。

You need a date field to use this aggregation type. Here we will have a date range, that is from date and to date are to be given. The buckets will have its documents based on the form and to date given.

date range

Filters

使用“过滤器”类型聚合,将根据过滤器形成子段。在这里,你会获得一个多子段,根据过滤器条件,一个文档可以存在一个或多个子段中。

With Filters type aggregation, the buckets will be formed based on the filter. Here you will get a multi-bucket formed as based on the filter criteria one document can exists in one or more buckets.

使用过滤器,用户可以在过滤器选项中编写查询,如下所示 − 。

Using filters, users can write their queries in the filter option as shown below −

filters

你可以通过使用“添加过滤器”按钮添加多个你选择的过滤器。

You can add multiple filters of your choice by using Add Filter button.

Histogram

此类型的聚合应用于数字字段,它会根据应用的时间间隔将文档分组到一个子段中。例如,0-50、50-100、100-150 等。

This type of aggregation is applied on a number field and it will group the documents in a bucket based on the interval applied. For example, 0-50,50-100,100-150 etc.

histogram

IPv4 Range

此类型的聚合被用于主要是用于 IP 地址。

This type of aggregation is used and mainly used for IP addresses.

ipv4 range

我们拥有的索引,即 contriesdata-28.12.2018 没有类型为 IP 的字段,所以它会显示如上所示的消息。如果你碰巧有 IP 字段,你可以像上所示那样指定其中的“自”和“至”值。

The index that we have that is the contriesdata-28.12.2018 does not have field of type IP so it displays a message as shown above. If you happen to have the IP field, you can specify the From and To values in it as shown above.

Range

此类型的聚合需要字段类型为数字。你需要指定范围,文档将被列在属于此范围的子段中。

This type of Aggregation needs fields to be of type number. You need to specify the range and the documents will be listed in the buckets falling in the range.

如果需要,你可以通过单击“添加范围”按钮添加更多范围。

You can add more range if required by clicking on the Add Range button.

Significant Terms

此类型的聚合主要用于字符串字段。

This type of aggregation is mostly used on the string fields.

significant terms

Terms

此类型的聚合用于所有可用的字段,例如数字、字符串、日期、布尔值、IP 地址、时间戳等。请注意,这是我们将在本教程中处理的所有可视化中将要使用的聚合。

This type of aggregation is used on all the available fields namely number, string, date, boolean, IP address, timestamp etc. Note that this is the aggregation we are going to use in all our visualization that we are going to work on in this tutorial.

terms

我们有一个“排序依据”选项,我们可以根据我们选择的指标对数据进行分组。大小是指你希望在可视化中显示的子段数。

We have an option order by which we will group the data based on the metric we select. The size refers to the number of buckets you want to display in the visualization.

接下来,我们来谈谈指标聚合。

Next, let us talk about Metric Aggregation.

Metric Aggregation

指标聚合主要指的是对子段中存在的文档进行的数学计算。例如,如果你选择一个数字字段,你可以针对此字段进行的指标计算包括计数、求和、最小值、最大值和平均值等。

Metric Aggregation mainly refers to the maths calculation done on the documents present in the bucket. For example if you choose a number field the metric calculation you can do on it is COUNT, SUM, MIN, MAX, AVERAGE etc.

这里给出了我们将讨论的指标聚合的列表 −

A list of metric aggregation that we shall discuss is given here −

metric aggregation

在本节中,让我们讨论我们将经常用到的重要指标 −

In this section, let us discuss the important ones which we are going to use often −

  1. Average

  2. Count

  3. Max

  4. Min

  5. Sum

该指标将应用于我们已经在上面讨论过的各个子段聚合中。

The metric will be applied on the individual bucket aggregation that we have already discussed above.

接下来,我们在此讨论指标聚合列表 −

Next, let us discuss the list of metrics aggregation here −

Average

这将给出桶中存在的文档值的平均值。例如 −

This will give the average for the values of the documents present in the buckets. For example −

average

R1 到 R6 是桶。在 R1 中,我们有 c1、c8 和 c15。假定 c1 的值为 300,c8 的值为 500,c15 的值为 700。现在,要获得 R1 桶的平均值

R1 to R6 are the buckets. In R1 we have c1,c8 and c15. Consider the value of c1 is 300, c8 is500 and c15 is 700. Now to get the average value of R1 bucket

R1 = c1 值 + c8 值 + c15 值 / 3 = 300 + 500 + 700 / 3 = 500。

R1 = value of c1 + value of c8 + value of c15 / 3 = 300 + 500 + 700 / 3 = 500.

R1 桶的平均值为 500。在此,文档的值可能是任何值,例如,如果你考虑的是国家数据,则可能是该区域的国家面积。

The average is 500 for bucket R1. Here the value of the document could be anything like if you consider the countries data it could be the area of the country in that region.

Count

这将给出桶中存在的文档数。假设你想获取区域中存在的国家数量,则它将是桶中存在的总文件数。例如,R1 将为 3、R2 = 6、R3 = 5、R4 = 2、R5 = 5 和 R6 = 4。

This will give the count of documents present in the Bucket. Suppose you want the count of the countries present in the region, it will be the total documents present in the buckets. For example, R1 it will be 3, R2 = 6, R3 = 5, R4 = 2, R5 = 5 and R6 = 4.

Max

这将给出桶中存在的文档的最大值。考虑上面的示例,如果我们有区域桶中按国家/地区划分的国家数据。每个区域的最大值都将是面积最大的国家。因此,它将从每个区域(即 R1 到 R6)中选择一个国家。

This will give the max value of the document present in the bucket. Considering the above example if we have area wise countries data in the region bucket. The max for each region will be the country with the max area. So it will have one country from each region i.e. R1 to R6.

in

这将给出桶中存在的文档的最小值。考虑上面的示例,如果我们有区域桶中按国家/地区划分的国家数据。每个区域的最小值都将是面积最小的国家。因此,它将从每个区域(即 R1 到 R6)中选择一个国家。

This will give the min value of the document present in the bucket. Considering above example if we have area wise countries data in the region bucket. The min for each region will be the country with the minimum area. So it will have one country from each region i.e. R1 to R6.

Sum

这将给出桶中存在的文档值的总和。例如,如果你考虑上面的示例,如果我们要获取区域中的总国家/地区面积,这将是区域中存在的文件的总和。

This will give the sum of the values of the document present in the bucket. For example if you consider the above example if we want the total area or countries in the region, it will be sum of the documents present in the region.

例如,要了解区域 R1 中的国家总数,它将是 3,R2 = 6,R3 = 5,R4 = 2,R5 = 5,R6 = 4。

For example, to know the total countries in the region R1 it will be 3, R2 = 6, R3 = 5, R4 = 2, R5 = 5 and R6 = 4.

如果我们有区域中面积的文档,则 R1 到 R6 将汇总区域的国家/地区面积。

In case we have documents with area in the region than R1 to R6 will have the country wise area summed up for the region.