Elasticsearch 简明教程

Elasticsearch - Rollup Data

汇总作业是一项周期性任务,它汇总索引模式指定的索引中的数据,然后将其汇总到新索引中。在以下示例中,我们创建了一个名为传感器的索引,其中包含不同的日期时间戳。然后,我们会创建一个汇总作业,以便使用 cron 作业定期汇总来自这些索引的数据。

A rollup job is a periodic task that summarizes data from indices specified by an index pattern and rolls it into a new index. In the following example, we create an index named sensor with different date time stamps. Then we create a rollup job to rollup the data from these indices periodically using cron job.

PUT /sensor/_doc/1
{
   "timestamp": 1516729294000,
   "temperature": 200,
   "voltage": 5.2,
   "node": "a"
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "sensor",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 0,
   "_primary_term" : 1
}

现在,添加第二个文档,并依次为其他文档添加文档。

Now, add a second document and so on for other documents as well.

PUT /sensor-2018-01-01/_doc/2
{
   "timestamp": 1413729294000,
   "temperature": 201,
   "voltage": 5.9,
   "node": "a"
}

Create a Rollup Job

PUT _rollup/job/sensor
{
   "index_pattern": "sensor-*",
   "rollup_index": "sensor_rollup",
   "cron": "*/30 * * * * ?",
   "page_size" :1000,
   "groups" : {
      "date_histogram": {
         "field": "timestamp",
         "interval": "60m"
      },
      "terms": {
         "fields": ["node"]
      }
   },
   "metrics": [
      {
         "field": "temperature",
         "metrics": ["min", "max", "sum"]
      },
      {
         "field": "voltage",
         "metrics": ["avg"]
      }
   ]
}

cron 参数控制作业的激活时间和频率。当汇总作业的 cron 计划触发时,它将从上次激活后的中断位置开始汇总

The cron parameter controls when and how often the job activates. When a rollup job’s cron schedule triggers, it will begin rolling up from where it left off after the last activation

在该作业运行并处理了一些数据之后,我们可以使用 DSL 查询执行一些搜索。

After the job has run and processed some data, we can use the DSL Query to do some searching.

GET /sensor_rollup/_rollup_search
{
   "size": 0,
   "aggregations": {
      "max_temperature": {
         "max": {
            "field": "temperature"
         }
      }
   }
}