Elasticsearch 简明教程

Elasticsearch - Quick Guide

Elasticsearch - Basic Concepts

Elasticsearch 是一个基于 Apache Lucene 的搜索服务器。它由 Shay Banon 开发并于 2010 年发布。它现在由 Elasticsearch BV 维护。其最新版本是 7.0.0。

Elasticsearch is an Apache Lucene-based search server. It was developed by Shay Banon and published in 2010. It is now maintained by Elasticsearch BV. Its latest version is 7.0.0.

Elasticsearch 是一个实时分布式且开源的全文本搜索和分析引擎。它可由 RESTful Web 服务界面访问,并且使用无模式 JSON(JavaScript 对象标记法)文档来存储数据。它基于 Java 编程语言构建,因此 Elasticsearch 可以运行在不同的平台上。它使用户能够非常高速探索非常大量的数据。

Elasticsearch is a real-time distributed and open source full-text search and analytics engine. It is accessible from RESTful web service interface and uses schema less JSON (JavaScript Object Notation) documents to store data. It is built on Java programming language and hence Elasticsearch can run on different platforms. It enables users to explore very large amount of data at very high speed.

General Features

Elasticsearch 的一般特性如下 −

The general features of Elasticsearch are as follows −

  1. Elasticsearch is scalable up to petabytes of structured and unstructured data.

  2. Elasticsearch can be used as a replacement of document stores like MongoDB and RavenDB.

  3. Elasticsearch uses denormalization to improve the search performance.

  4. Elasticsearch is one of the popular enterprise search engines, and is currently being used by many big organizations like Wikipedia, The Guardian, StackOverflow, GitHub etc.

  5. Elasticsearch is an open source and available under the Apache license version 2.0.

Key Concepts

Elasticsearch 的主要概念如下 −

The key concepts of Elasticsearch are as follows −

Node

这是指 Elasticsearch 的单个运行实例。单个物理和虚拟服务器可容纳多个节点,具体取决于其物理资源(如 RAM、存储和处理能力)的功能。

It refers to a single running instance of Elasticsearch. Single physical and virtual server accommodates multiple nodes depending upon the capabilities of their physical resources like RAM, storage and processing power.

Cluster

这是由一个或多个节点组成的。群集为所有节点提供跨整个数据的集体索引和搜索功能。

It is a collection of one or more nodes. Cluster provides collective indexing and search capabilities across all the nodes for entire data.

Index

这是不同类型文档及其属性的集合。索引还使用分片概念来提高性能。例如,一组文档包含社交网络应用程序的数据。

It is a collection of different type of documents and their properties. Index also uses the concept of shards to improve the performance. For example, a set of document contains data of a social networking application.

Document

这是按照 JSON 格式定义的特定方式中的字段集合。每个文档都属于一个类型并且驻留在索引中。每个文档都与一个称为 UID 的唯一标识符关联。

It is a collection of fields in a specific manner defined in JSON format. Every document belongs to a type and resides inside an index. Every document is associated with a unique identifier called the UID.

Shard

索引被水平细分为分片。这意味着每个分片包含文档的所有属性,但包含的 JSON 对象数量少于索引。水平分离使分片成为一个独立的节点,该节点可以存储在任何节点中。主分片是索引的原始水平部分,然后这些主分片被复制到副本分片中。

Indexes are horizontally subdivided into shards. This means each shard contains all the properties of document but contains less number of JSON objects than index. The horizontal separation makes shard an independent node, which can be store in any node. Primary shard is the original horizontal part of an index and then these primary shards are replicated into replica shards.

Replicas

Elasticsearch 允许用户创建其索引和分片的副本。复制不仅有助于在发生故障时增加数据的可用性,还通过在这些副本中执行并行搜索操作来提高搜索性能。

Elasticsearch allows a user to create replicas of their indexes and shards. Replication not only helps in increasing the availability of data in case of failure, but also improves the performance of searching by carrying out a parallel search operation in these replicas.

Advantages

  1. Elasticsearch is developed on Java, which makes it compatible on almost every platform.

  2. Elasticsearch is real time, in other words after one second the added document is searchable in this engine

  3. Elasticsearch is distributed, which makes it easy to scale and integrate in any big organization.

  4. Creating full backups are easy by using the concept of gateway, which is present in Elasticsearch.

  5. Handling multi-tenancy is very easy in Elasticsearch when compared to Apache Solr.

  6. Elasticsearch uses JSON objects as responses, which makes it possible to invoke the Elasticsearch server with a large number of different programming languages.

  7. Elasticsearch supports almost every document type except those that do not support text rendering.

Disadvantages

  1. Elasticsearch does not have multi-language support in terms of handling request and response data (only possible in JSON) unlike in Apache Solr, where it is possible in CSV, XML and JSON formats.

  2. Occasionally, Elasticsearch has a problem of Split brain situations.

Comparison between Elasticsearch and RDBMS

在 Elasticsearch 中,索引类似于 RDBMS(关系数据库管理系统)中的表。每个表都是一组行,正如每个索引都是 Elasticsearch 中一组文档。

In Elasticsearch, index is similar to tables in RDBMS (Relation Database Management System). Every table is a collection of rows just as every index is a collection of documents in Elasticsearch.

下表给出了这些术语之间的直接比较:

The following table gives a direct comparison between these terms−

Elasticsearch

RDBMS

Cluster

Database

Shard

Shard

Index

Table

Field

Column

Document

Row

Elasticsearch - Installation

在本章中,我们将详细了解 Elasticsearch 的安装过程。

In this chapter, we will understand the installation procedure of Elasticsearch in detail.

要在本地计算机上安装 Elasticsearch,您必须按照以下步骤操作:

To install Elasticsearch on your local computer, you will have to follow the steps given below −

Step 1 - 检查计算机上安装的 Java 版本。它应为 Java 7 或更高版本。您可以通过执行以下操作进行检查 -

Step 1 − Check the version of java installed on your computer. It should be java 7 or higher. You can check by doing the following −

在 Windows 操作系统 (OS) 中(使用命令提示符)-

In Windows Operating System (OS) (using command prompt)−

> java -version

在 UNIX OS 中(使用终端)-

In UNIX OS (Using Terminal) −

$ echo $JAVA_HOME

Step 2 - 根据您的操作系统,从 www.elastic.co 下载 Elasticsearch,如下所示:

Step 2 − Depending on your operating system, download Elasticsearch from www.elastic.co as mentioned below −

  1. For windows OS, download ZIP file.

  2. For UNIX OS, download TAR file.

  3. For Debian OS, download DEB file.

  4. For Red Hat and other Linux distributions, download RPN file.

  5. APT and Yum utilities can also be used to install Elasticsearch in many Linux distributions.

Step 3 - Elasticsearch 的安装过程很简单,以下是针对不同操作系统的说明:

Step 3 − Installation process for Elasticsearch is simple and is described below for different OS −

  1. Windows OS− Unzip the zip package and the Elasticsearch is installed.

  2. UNIX OS− Extract tar file in any location and the Elasticsearch is installed.

$wget
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch7.0.0-linux-x86_64.tar.gz

$tar -xzf elasticsearch-7.0.0-linux-x86_64.tar.gz
  1. Using APT utility for Linux OS− Download and install the Public Signing Key

$ wget -qo - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo
apt-key add -

保存存储库定义,如下所示:

Save the repository definition as shown below −

$ echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" |
sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

使用以下命令运行更新:

Run update using the following command −

$ sudo apt-get update

现在您可以使用以下命令进行安装:

Now you can install by using the following command −

$ sudo apt-get install elasticsearch
  1. Download and install the Debian package manually using the command given here −

$wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch7.0.0-amd64.deb
$sudo dpkg -i elasticsearch-7.0.0-amd64.deb0
  1. Using YUM utility for Debian Linux OS

$ rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
  1. ADD the following text in the file with .repo suffix in your “/etc/yum.repos.d/” directory. For example, elasticsearch.repo

elasticsearch-7.x]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
  1. You can now install Elasticsearch by using the following command

sudo yum install elasticsearch

Step 4 - 转到 Elasticsearch 主目录,然后进入 bin 文件夹。针对 Windows 系统,请运行 elasticsearch.bat 文件,或者针对 UNIX 系统,您可以通过命令提示符和终端运行同样的文件。

Step 4 − Go to the Elasticsearch home directory and inside the bin folder. Run the elasticsearch.bat file in case of Windows or you can do the same using command prompt and through terminal in case of UNIX rum Elasticsearch file.

In Windows

> cd elasticsearch-2.1.0/bin
> elasticsearch

In Linux

$ cd elasticsearch-2.1.0/bin
$ ./elasticsearch

Note - 针对 Windows 系统,您可能会收到一条错误信息,提示未设置 JAVA_HOME,请在环境变量中将其设置为 “C:\Program Files\Java\jre1.8.0_31” 或您安装 Java 的位置。

Note − In case of windows, you might get an error stating JAVA_HOME is not set, please set it in environment variables to “C:\Program Files\Java\jre1.8.0_31” or the location where you installed java.

Step 5 − Elasticsearch Web 界面使用的默认端口是 9200,或者你可以通过修改 bin 目录中 elasticsearch.yml 文件中的 http.port 来更改。你可以通过浏览 http://localhost:9200 来查看服务器是否已启动并正在运行。它将返回一个 JSON 对象,该对象包含安装 Elasticsearch 的信息,如下所示 −

Step 5 − The default port for Elasticsearch web interface is 9200 or you can change it by changing http.port inside the elasticsearch.yml file present in bin directory. You can check if the server is up and running by browsing http://localhost:9200. It will return a JSON object, which contains the information about the installed Elasticsearch in the following manner −

{
   "name" : "Brain-Child",
   "cluster_name" : "elasticsearch", "version" : {
      "number" : "2.1.0",
      "build_hash" : "72cd1f1a3eee09505e036106146dc1949dc5dc87",
      "build_timestamp" : "2015-11-18T22:40:03Z",
      "build_snapshot" : false,
      "lucene_version" : "5.3.1"
   },
   "tagline" : "You Know, for Search"
}

Step 6 − 在此步骤中,让我们安装 Kibana。按照下面给出的相应代码在 Linux 和 Windows 上进行安装 −

Step 6 − In this step, let us install Kibana. Follow the respective code given below for installing on Linux and Windows −

For Installation on Linux −

For Installation on Linux −

wget https://artifacts.elastic.co/downloads/kibana/kibana-7.0.0-linuxx86_64.tar.gz

tar -xzf kibana-7.0.0-linux-x86_64.tar.gz

cd kibana-7.0.0-linux-x86_64/

./bin/kibana

For Installation on Windows −

For Installation on Windows −

https://www.elastic.co/downloads/kibana 下载适用于 Windows 的 Kibana 一旦你单击该链接,你将看到如下所示的主页 −

Download Kibana for Windows from https://www.elastic.co/downloads/kibana Once you click the link, you will find the home page as shown below −

installation on windows

解压并转到 Kibana 主目录,然后运行它。

Unzip and go to the Kibana home directory and then run it.

CD c:\kibana-7.0.0-windows-x86_64
.\bin\kibana.bat

Elasticsearch - Populate

在本章中,我们学习了如何在 Elasticsearch 中添加一些索引、映射和数据。请注意,此教程中解释的示例将使用其中一些数据。

In this chapter, let us learn how to add some index, mapping and data to Elasticsearch. Note that some of this data will be used in the examples explained in this tutorial.

Create Index

您可以使用以下命令创建索引 -

You can use the following command to create an index −

PUT school

Response

如果创建索引,您会看到以下输出 -

If the index is created, you can see the following output −

{"acknowledged": true}

Add data

Elasticsearch 将存储我们添加到索引中的文档,如下代码所示。这些文档被赋予一些用于识别文档的 ID。

Elasticsearch will store the documents we add to the index as shown in the following code. The documents are given some IDs which are used in identifying the document.

Request Body

POST school/_doc/10
{
   "name":"Saint Paul School", "description":"ICSE Afiliation",
   "street":"Dawarka", "city":"Delhi", "state":"Delhi", "zip":"110075",
   "location":[28.5733056, 77.0122136], "fees":5000,
   "tags":["Good Faculty", "Great Sports"], "rating":"4.5"
}

Response

{
   "_index" : "school",
   "_type" : "_doc",
   "_id" : "10",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 2,
   "_primary_term" : 1
}

在此处,我们正在添加另一个类似文档。

Here, we are adding another similar document.

POST school/_doc/16
{
   "name":"Crescent School", "description":"State Board Affiliation",
   "street":"Tonk Road",
   "city":"Jaipur", "state":"RJ", "zip":"176114","location":[26.8535922,75.7923988],
   "fees":2500, "tags":["Well equipped labs"], "rating":"4.5"
}

Response

{
   "_index" : "school",
   "_type" : "_doc",
   "_id" : "16",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 9,
   "_primary_term" : 7
}

这样,我们将继续添加我们将在即将到来的章节中为工作添加的任何示例数据。

In this way, we will keep adding any example data that we need for our working in the upcoming chapters.

Adding Sample Data in Kibana

Kibana 是一个用于访问数据和创建可视化的 GUI 驱动的工具。在此部分中,我们将了解如何向其中添加示例数据。

Kibana is a GUI driven tool for accessing the data and creating the visualization. In this section, let us understand how we can add sample data to it.

在 Kibana 主页中,选择以下选项以添加示例电子商务数据 -

In the Kibana home page, choose the following option to add sample ecommerce data −

kibana home page

下一个屏幕将显示一些可视化和一个“添加数据”按钮 -

The next screen will show some visualization and a button to Add data −

add data to kibana

单击“添加数据”将显示以下屏幕,确认已将数据添加到名为 eCommerce 的索引。

Clicking on Add Data will show the following screen which confirms the data has been added to an index named eCommerce.

ecommerce revenue dashboard

Elasticsearch - Migration between Versions

在任何系统或软件中,当我们升级到较新的版本时,我们需要遵循一些步骤来维护应用程序设置、配置、数据和其他内容。需要这些步骤才能在新系统中使应用程序保持稳定或保持数据的完整性(防止数据损坏)。

In any system or software, when we are upgrading to newer version, we need to follow a few steps to maintain the application settings, configurations, data and other things. These steps are required to make the application stable in new system or to maintain the integrity of data (prevent data from getting corrupt).

你需要按照以下步骤来升级 Elasticsearch −

You need to follow the following steps to upgrade Elasticsearch −

  1. Read Upgrade docs from https://www.elastic.co/

  2. Test the upgraded version in your non production environments like in UAT, E2E, SIT or DEV environment.

  3. Note that rollback to previous Elasticsearch version is not possible without data backup. Hence, a data backup is recommended before upgrading to a higher version.

  4. We can upgrade using full cluster restart or rolling upgrade. Rolling upgrade is for new versions. Note that there is no service outage, when you are using rolling upgrade method for migration.

Steps for Upgrade

  1. Test the upgrade in a dev environment before upgrading your production cluster.

  2. Back up your data. You cannot roll back to an earlier version unless you have a snapshot of your data.

  3. Consider closing machine learning jobs before you start the upgrade process. While machine learning jobs can continue to run during a rolling upgrade, it increases the overhead on the cluster during the upgrade process.

  4. Upgrade the components of your Elastic Stack in the following order − ElasticsearchKibanaLogstashBeatsAPM Server

Upgrading from 6.6 or Earlier

要从版本 6.0-6.6 直接升级到 Elasticsearch 7.1.0,你必须手动重新编制任何需要继续使用的 5.x 索引,并执行完整集群重启。

To upgrade directly to Elasticsearch 7.1.0 from versions 6.0-6.6, you must manually reindex any 5.x indices you need to carry forward, and perform a full cluster restart.

Full Cluster Restart

完整集群重启的过程涉及关闭集群中的每个节点,将每个节点升级到 7x,然后重新启动集群。

The process of full cluster restart involves shutting down each node in the cluster, upgrading each node to 7x and then restarting the cluster.

以下是执行完整集群重启所需的高级步骤:

Following are the high level steps that need to be carried out for full cluster restart −

  1. Disable shard allocation

.

  1. Stop indexing and perform a synced flush

.

  1. Shutdown all nodes

.

  1. Upgrade all nodes

.

  1. Upgrade any plugins

.

  1. Start each upgraded node

.

  1. Wait for all nodes to join the cluster and report a status of yellow

.

  1. Re-enable allocation

.

重新启用分配后,集群开始将副本分片分配到数据节点。此时,恢复编制索引和搜索是安全的,但如果你可以等到所有主分片和副本分片都已成功分配且所有节点的状态都为绿色,那么集群将恢复得更快。

Once allocation is re-enabled, the cluster starts allocating the replica shards to the data nodes. At this point, it is safe to resume indexing and searching, but your cluster will recover more quickly if you can wait until all primary and replica shards have been successfully allocated and the status of all nodes is green.

Elasticsearch - API Conventions

网络中的应用程序编程接口 (API) 是一组函数调用或其他编程指令,用于访问特定 Web 应用程序中的软件组件。例如,Facebook API 帮助开发人员通过访问 Facebook 中的数据或其他功能来创建应用程序;它可以是出生日期或状态更新。

Application Programming Interface (API) in web is a group of function calls or other programming instructions to access the software component in that particular web application. For example, Facebook API helps a developer to create applications by accessing data or other functionalities from Facebook; it can be date of birth or status update.

Elasticsearch 提供了一个 REST API,该 API 通过 HTTP 上的 JSON 进行访问。Elasticsearch 使用了一些约定,我们现在将讨论它们。

Elasticsearch provides a REST API, which is accessed by JSON over HTTP. Elasticsearch uses some conventions which we shall discuss now.

Multiple Indices

API 中的大多数操作(主要是搜索和其他操作)面向一个或多个索引。这使用户只需执行一次查询即可在多个位置或所有可用数据中进行搜索。使用许多不同的符号在多个索引中执行操作。我们将在本章中讨论其中的几个。

Most of the operations, mainly searching and other operations, in APIs are for one or more than one indices. This helps the user to search in multiple places or all the available data by just executing a query once. Many different notations are used to perform operations in multiple indices. We will discuss a few of them here in this chapter.

Comma Separated Notation

POST /index1,index2,index3/_search

Request Body

{
   "query":{
      "query_string":{
         "query":"any_string"
      }
   }
}

Response

包含 any_string 的 index1、index2、index3 中的 JSON 对象。

JSON objects from index1, index2, index3 having any_string in it.

_all Keyword for All Indices

POST /_all/_search

Request Body

{
   "query":{
      "query_string":{
         "query":"any_string"
      }
   }
}

Response

包含 any_string 的所有索引中的 JSON 对象。

JSON objects from all indices and having any_string in it.

Wildcards ( * , + , –)

POST /school*/_search

Request Body

{
   "query":{
      "query_string":{
         "query":"CBSE"
      }
   }
}

Response

在所有索引中包含“school”开头的包含 CBSE 的 JSON 对象。

JSON objects from all indices which start with school having CBSE in it.

或者,您也可以使用以下代码:-

Alternatively, you can use the following code as well −

POST /school*,-schools_gov /_search

Request Body

{
   "query":{
      "query_string":{
         "query":"CBSE"
      }
   }
}

Response

所有索引中包含“school”开头的但不包含 schools_gov 并且包含 CBSE 的 JSON 对象。

JSON objects from all indices which start with “school” but not from schools_gov and having CBSE in it.

还有一些 URL 查询字符串参数:-

There are also some URL query string parameters −

  1. ignore_unavailable − No error will occur or no operation will be stopped, if the one or more index(es) present in the URL does not exist. For example, schools index exists, but book_shops does not exist.

POST /school*,book_shops/_search

Request Body

{
   "query":{
      "query_string":{
         "query":"CBSE"
      }
   }
}

Request Body

{
   "error":{
      "root_cause":[{
         "type":"index_not_found_exception", "reason":"no such index",
         "resource.type":"index_or_alias", "resource.id":"book_shops",
         "index":"book_shops"
      }],
      "type":"index_not_found_exception", "reason":"no such index",
      "resource.type":"index_or_alias", "resource.id":"book_shops",
      "index":"book_shops"
   },"status":404
}

考虑以下代码:-

Consider the following code −

POST /school*,book_shops/_search?ignore_unavailable = true

Request Body

{
   "query":{
      "query_string":{
         "query":"CBSE"
      }
   }
}

Response (no error)

在所有索引中包含“school”开头的包含 CBSE 的 JSON 对象。

JSON objects from all indices which start with school having CBSE in it.

allow_no_indices

true 此参数的值可防止错误,如果带通配符的 URL 导致没有索引。例如,没有以 schools_pri 开头的索引

true value of this parameter will prevent error, if a URL with wildcard results in no indices. For example, there is no index that starts with schools_pri −

POST /schools_pri*/_search?allow_no_indices = true

Request Body

{
   "query":{
      "match_all":{}
   }
}

Response (No errors)

{
   "took":1,"timed_out": false, "_shards":{"total":0, "successful":0, "failed":0},
   "hits":{"total":0, "max_score":0.0, "hits":[]}
}

expand_wildcards

这个参数决定了通配符是需要扩展成开指标还是闭指标,或同时执行这两种操作。此参数的值可以是 open 和 closed,也可以是 none 和 all。

This parameter decides whether the wildcards need to be expanded to open indices or closed indices or perform both. The value of this parameter can be open and closed or none and all.

例如,关闭索引学校:

For example, close index schools −

POST /schools/_close

Response

{"acknowledged":true}

考虑以下代码:-

Consider the following code −

POST /school*/_search?expand_wildcards = closed

Request Body

{
   "query":{
      "match_all":{}
   }
}

Response

{
   "error":{
      "root_cause":[{
         "type":"index_closed_exception", "reason":"closed", "index":"schools"
      }],
      "type":"index_closed_exception", "reason":"closed", "index":"schools"
   }, "status":403
}

Date Math Support in Index Names

Elasticsearch 提供一种按照日期和时间搜索索引的功能。我们需要使用特定格式指定日期和时间。例如,accountdetail-2015.12.30,索引将存储 2015 年 12 月 30 日的银行帐户详细信息。可以执行数学运算以获取特定日期或日期和时间范围的详细信息。

Elasticsearch offers a functionality to search indices according to date and time. We need to specify date and time in a specific format. For example, accountdetail-2015.12.30, index will store the bank account details of 30th December 2015. Mathematical operations can be performed to get details for a particular date or a range of date and time.

日期数学索引名称格式:

Format for date math index name −

<static_name{date_math_expr{date_format|time_zone}}>
/<accountdetail-{now-2d{YYYY.MM.dd|utc}}>/_search

static_name 是表达的一部分,在每个日期数学索引(如帐户详细信息)中保持不变。date_math_expr 包含确定日期和时间(如 now-2d)的数学表达式。date_format 包含日期在索引中写入的格式,如 YYYY.MM.dd。如果今天的日期是 2015 年 12 月 30 日,那么 <accountdetail-{now-2d{YYYY.MM.dd}}> 将返回 accountdetail-2015.12.28。

static_name is a part of expression which remains the same in every date math index like account detail. date_math_expr contains the mathematical expression that determines the date and time dynamically like now-2d. date_format contains the format in which the date is written in index like YYYY.MM.dd. If today’s date is 30th December 2015, then <accountdetail-{now-2d{YYYY.MM.dd}}> will return accountdetail-2015.12.28.

Expression

Resolves to

<accountdetail-{now-d}>

accountdetail-2015.12.29

<accountdetail-{now-M}>

accountdetail-2015.11.30

<accountdetail-{now{YYYY.MM}}>

accountdetail-2015.12

现在,我们将了解 Elasticsearch 中一些可用于以指定格式获取响应的常用选项。

We will now see some of the common options available in Elasticsearch that can be used to get the response in a specified format.

Pretty Results

只需追加 URL 查询参数(即 pretty = true)即可获得格式良好的 JSON 对象中的响应。

We can get response in a well-formatted JSON object by just appending a URL query parameter, i.e., pretty = true.

POST /schools/_search?pretty = true

Request Body

{
   "query":{
      "match_all":{}
   }
}

Response

……………………..
{
   "_index" : "schools", "_type" : "school", "_id" : "1", "_score" : 1.0,
   "_source":{
      "name":"Central School", "description":"CBSE Affiliation",
      "street":"Nagan", "city":"paprola", "state":"HP", "zip":"176115",
      "location": [31.8955385, 76.8380405], "fees":2000,
      "tags":["Senior Secondary", "beautiful campus"], "rating":"3.5"
   }
}
………………….

Human Readable Output

此选项可以将统计响应更改为人类可读形式(如果 human = true)或计算机可读形式(如果 human = false)。例如,如果 human = true,则 distance_kilometer = 20KM;如果 human = false,则 distance_meter = 20000(当需要另一个计算机程序使用响应时)。

This option can change the statistical responses either into human readable form (If human = true) or computer readable form (if human = false). For example, if human = true then distance_kilometer = 20KM and if human = false then distance_meter = 20000, when response needs to be used by another computer program.

Response Filtering

我们可以通过将字段添加到 field_path 参数中来过滤响应到更少的字段。例如,

We can filter the response to less fields by adding them in the field_path parameter. For example,

POST /schools/_search?filter_path = hits.total

Request Body

{
   "query":{
      "match_all":{}
   }
}

Response

{"hits":{"total":3}}

Elasticsearch - Document APIs

Elasticsearch 提供了单文档 API 和多文档 API,其中 API 调用分别针对单个文档和多个文档。

Elasticsearch provides single document APIs and multi-document APIs, where the API call is targeting a single document and multiple documents respectively.

Index API

当针对特定映射向相应索引发出请求时,它有助于将 JSON 文档添加到索引中或在其中更新 JSON 文档。例如,以下请求会将 JSON 对象添加到索引学校和学校映射中 -

It helps to add or update the JSON document in an index when a request is made to that respective index with specific mapping. For example, the following request will add the JSON object to index schools and under school mapping −

PUT schools/_doc/5
{
   name":"City School", "description":"ICSE", "street":"West End",
   "city":"Meerut",
   "state":"UP", "zip":"250002", "location":[28.9926174, 77.692485],
   "fees":3500,
   "tags":["fully computerized"], "rating":"4.5"
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "schools",
   "_type" : "_doc",
   "_id" : "5",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 2,
   "_primary_term" : 1
}

Automatic Index Creation

当提出向特定索引添加 JSON 对象的请求且该索引不存在时,此 API 会自动创建该索引以及该特定 JSON 对象的基础映射。可以通过将 elasticsearch.yml 文件中存在的以下参数的值更改为 false 来禁用此功能。

When a request is made to add JSON object to a particular index and if that index does not exist, then this API automatically creates that index and also the underlying mapping for that particular JSON object. This functionality can be disabled by changing the values of following parameters to false, which are present in elasticsearch.yml file.

action.auto_create_index:false
index.mapper.dynamic:false

您还可以限制自动创建索引,其中仅允许具有特定模式的索引名称,方法是更改以下参数的值 -

You can also restrict the auto creation of index, where only index name with specific patterns are allowed by changing the value of the following parameter −

action.auto_create_index:+acc*,-bank*

Note - 此处 + 表示允许,- 表示不允许。

Note − Here + indicates allowed and – indicates not allowed.

Versioning

Elasticsearch 还提供了版本控制工具。我们可以使用 version 查询参数来指定特定文档的版本。

Elasticsearch also provides version control facility. We can use a version query parameter to specify the version of a particular document.

PUT schools/_doc/5?version=7&version_type=external
{
   "name":"Central School", "description":"CBSE Affiliation", "street":"Nagan",
   "city":"paprola", "state":"HP", "zip":"176115", "location":[31.8955385, 76.8380405],
   "fees":2200, "tags":["Senior Secondary", "beautiful campus"], "rating":"3.3"
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "schools",
   "_type" : "_doc",
   "_id" : "5",
   "_version" : 7,
   "result" : "updated",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 3,
   "_primary_term" : 1
}

版本控制是一个实时过程,不会受到实时搜索操作的影响。

Versioning is a real-time process and it is not affected by the real time search operations.

有两种最重要的版本控制类型 -

There are two most important types of versioning −

Internal Versioning

内部版本控制是默认版本,从 1 开始,且每次更新都会增加,包括删除。

Internal versioning is the default version that starts with 1 and increments with each update, deletes included.

External Versioning

当文档版本存储在第三方版本控制系统等外部系统中时,它会被使用。要启用此功能,我们需要将 version_type 设置为 external。此处,Elasticsearch 会将由外部系统指定的版本号存储起来,且不会自动增加它们。

It is used when the versioning of the documents is stored in an external system like third party versioning systems. To enable this functionality, we need to set version_type to external. Here Elasticsearch will store version number as designated by the external system and will not increment them automatically.

Operation Type

操作类型用于强制创建操作。这有助于避免覆盖现有文档。

The operation type is used to force a create operation. This helps to avoid the overwriting of existing document.

PUT chapter/_doc/1?op_type=create
{
   "Text":"this is chapter one"
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "chapter",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 0,
   "_primary_term" : 1
}

Automatic ID generation

当在索引操作中未指定 ID 时,Elasticsearch 会自动为此文档生成 id。

When ID is not specified in index operation, then Elasticsearch automatically generates id for that document.

POST chapter/_doc/
{
   "user" : "tpoint",
   "post_date" : "2018-12-25T14:12:12",
   "message" : "Elasticsearch Tutorial"
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "chapter",
   "_type" : "_doc",
   "_id" : "PVghWGoB7LiDTeV6LSGu",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 1,
   "_primary_term" : 1
}

Get API

API 通过对特定文档执行获取请求来帮助提取类型 JSON 对象。

API helps to extract type JSON object by performing a get request for a particular document.

pre class="prettyprint notranslate" > GET schools/_doc/5

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "schools",
   "_type" : "_doc",
   "_id" : "5",
   "_version" : 7,
   "_seq_no" : 3,
   "_primary_term" : 1,
   "found" : true,
   "_source" : {
      "name" : "Central School",
      "description" : "CBSE Affiliation",
      "street" : "Nagan",
      "city" : "paprola",
      "state" : "HP",
      "zip" : "176115",
      "location" : [
         31.8955385,
         76.8380405
      ],
      "fees" : 2200,
      "tags" : [
         "Senior Secondary",
         "beautiful campus"
      ],
      "rating" : "3.3"
   }
}
  1. This operation is real time and does not get affected by the refresh rate of Index.

  2. You can also specify the version, then Elasticsearch will fetch that version of document only.

  3. You can also specify the _all in the request, so that the Elasticsearch can search for that document id in every type and it will return the first matched document.

  4. You can also specify the fields you want in your result from that particular document.

GET schools/_doc/5?_source_includes=name,fees

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "schools",
   "_type" : "_doc",
   "_id" : "5",
   "_version" : 7,
   "_seq_no" : 3,
   "_primary_term" : 1,
   "found" : true,
   "_source" : {
      "fees" : 2200,
      "name" : "Central School"
   }
}

您只需在 get 请求中添加 _source 部分,即可在结果中获取源代码部分。

You can also fetch the source part in your result by just adding _source part in your get request.

GET schools/_doc/5?_source

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "schools",
   "_type" : "_doc",
   "_id" : "5",
   "_version" : 7,
   "_seq_no" : 3,
   "_primary_term" : 1,
   "found" : true,
   "_source" : {
      "name" : "Central School",
      "description" : "CBSE Affiliation",
      "street" : "Nagan",
      "city" : "paprola",
      "state" : "HP",
      "zip" : "176115",
      "location" : [
         31.8955385,
         76.8380405
      ],
      "fees" : 2200,
      "tags" : [
         "Senior Secondary",
         "beautiful campus"
      ],
      "rating" : "3.3"
   }
}

在执行 get 操作之前,您还可以通过将 refresh 参数设置为 true 来刷新分片。

You can also refresh the shard before doing get operation by set refresh parameter to true.

Delete API

您可以通过向 Elasticsearch 发送 HTTP DELETE 请求来删除特定索引、映射或文档。

You can delete a particular index, mapping or a document by sending a HTTP DELETE request to Elasticsearch.

DELETE schools/_doc/4

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "found":true, "_index":"schools", "_type":"school", "_id":"4", "_version":2,
   "_shards":{"total":2, "successful":1, "failed":0}
}

可以指定文档的版本,以删除该特定版本。可以指定路由参数,以从特定用户删除文档,如果文档不属于该特定用户,则该操作将失败。在此操作中,您可以指定 refresh 和 timeout 选项,与 GET API 相同。

Version of the document can be specified to delete that particular version. Routing parameter can be specified to delete the document from a particular user and the operation fails if the document does not belong to that particular user. In this operation, you can specify refresh and timeout option same like GET API.

Update API

脚本用于执行此操作,版本控制用于确保在 get 和重新索引期间没有更新发生。例如,您可以使用脚本更新学校费用 −

Script is used for performing this operation and versioning is used to make sure that no updates have happened during the get and re-index. For example, you can update the fees of school using script −

POST schools/_update/4
{
   "script" : {
      "source": "ctx._source.name = params.sname",
      "lang": "painless",
      "params" : {
         "sname" : "City Wise School"
      }
   }
 }

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "schools",
   "_type" : "_doc",
   "_id" : "4",
   "_version" : 3,
   "result" : "updated",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 4,
   "_primary_term" : 2
}

您可以通过向更新的文档发送 get 请求检查更新。

You can check the update by sending get request to the updated document.

Elasticsearch - Search APIs

此 API 用于在 Elasticsearch 中搜索内容。用户可以通过使用查询字符串作为参数发送 GET 请求或在 POST 请求的信息主体中发布查询来进行搜索。基本上,所有搜索 API 都是多索引多类型。

This API is used to search content in Elasticsearch. A user can search by sending a get request with query string as a parameter or they can post a query in the message body of post request. Mainly all the search APIS are multi-index, multi-type.

Multi-Index

Elasticsearch 允许我们搜索所有索引或某些特定索引中存在的文档。例如,如果我们需要搜索名称中包含中心的所有文档,我们可以像这里所示的那样进行操作:

Elasticsearch allows us to search for the documents present in all the indices or in some specific indices. For example, if we need to search all the documents with a name that contains central, we can do as shown here −

GET /_all/_search?q=city:paprola

在运行上述代码后,我们会得到以下响应:

On running the above code, we get the following response −

{
   "took" : 33,
   "timed_out" : false,
   "_shards" : {
      "total" : 7,
      "successful" : 7,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 1,
         "relation" : "eq"
      },
      "max_score" : 0.9808292,
      "hits" : [
         {
            "_index" : "schools",
            "_type" : "school",
            "_id" : "5",
            "_score" : 0.9808292,
            "_source" : {
               "name" : "Central School",
               "description" : "CBSE Affiliation",
               "street" : "Nagan",
               "city" : "paprola",
               "state" : "HP",
               "zip" : "176115",
               "location" : [
                  31.8955385,
                  76.8380405
               ],
               "fees" : 2200,
               "tags" : [
                  "Senior Secondary",
                  "beautiful campus"
               ],
               "rating" : "3.3"
            }
         }
      ]
   }
}

在搜索操作中可以使用统一资源标识符传入许多参数:

Many parameters can be passed in a search operation using Uniform Resource Identifier −

S.No

Parameter & Description

1

Q This parameter is used to specify query string.

2

lenient This parameter is used to specify query string.Format based errors can be ignored by just setting this parameter to true. It is false by default.

3

fields This parameter is used to specify query string.

4

sort We can get sorted result by using this parameter, the possible values for this parameter is fieldName, fieldName:asc/fieldname:desc

5

timeout We can restrict the search time by using this parameter and response only contains the hits in that specified time. By default, there is no timeout.

6

terminate_after We can restrict the response to a specified number of documents for each shard, upon reaching which the query will terminate early. By default, there is no terminate_after.

7

from The starting from index of the hits to return. Defaults to 0.

8

size It denotes the number of hits to return. Defaults to 10.

我们还可以在请求正文中使用查询 DSL 指定查询,在之前的章节中已经给出了许多示例。这里给出了一个这样的示例:

We can also specify query using query DSL in request body and there are many examples already given in previous chapters. One such example is given here −

POST /schools/_search
{
   "query":{
      "query_string":{
         "query":"up"
      }
   }
}

在运行上述代码后,我们会得到以下响应:

On running the above code, we get the following response −

{
   "took" : 11,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 1,
         "relation" : "eq"
      },
      "max_score" : 0.47000363,
      "hits" : [
         {
            "_index" : "schools",
            "_type" : "school",
            "_id" : "4",
            "_score" : 0.47000363,
            "_source" : {
               "name" : "City Best School",
               "description" : "ICSE",
               "street" : "West End",
               "city" : "Meerut",
               "state" : "UP",
               "zip" : "250002",
               "location" : [
                  28.9926174,
                  77.692485
               ],
               "fees" : 3500,
               "tags" : [
                  "fully computerized"
               ],
               "rating" : "4.5"
            }
         }
      ]
   }
}

Elasticsearch - Aggregations

聚合框架收集搜索查询选择的所有数据,并包含许多构建模块,有助于构建数据的复杂摘要。此处显示了聚合的基本结构:

The aggregations framework collects all the data selected by the search query and consists of many building blocks, which help in building complex summaries of the data. The basic structure of an aggregation is shown here −

"aggregations" : {
   "" : {
      "" : {

      }

      [,"meta" : { [] } ]?
      [,"aggregations" : { []+ } ]?
   }
   [,"" : { ... } ]*
}

有不同类型的聚合,每种聚合都有自己的目的。本章将详细讨论它们。

There are different types of aggregations, each with its own purpose. They are discussed in detail in this chapter.

Metrics Aggregations

这些聚合有助于从聚合文档的字段值计算矩阵,有时可以从脚本中生成一些值。

These aggregations help in computing matrices from the field’s values of the aggregated documents and sometime some values can be generated from scripts.

数字矩阵要么是单值的(如平均聚合),要么是多值的(如统计信息)。

Numeric matrices are either single-valued like average aggregation or multi-valued like stats.

Avg Aggregation

此聚合用于获取存在于聚合文档中的任何数字字段的平均值。例如,

This aggregation is used to get the average of any numeric field present in the aggregated documents. For example,

POST /schools/_search
{
   "aggs":{
      "avg_fees":{"avg":{"field":"fees"}}
   }
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "took" : 41,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 2,
         "relation" : "eq"
      },
      "max_score" : 1.0,
      "hits" : [
         {
            "_index" : "schools",
            "_type" : "school",
            "_id" : "5",
            "_score" : 1.0,
            "_source" : {
               "name" : "Central School",
               "description" : "CBSE Affiliation",
               "street" : "Nagan",
               "city" : "paprola",
               "state" : "HP",
               "zip" : "176115",
               "location" : [
                  31.8955385,
                  76.8380405
               ],
            "fees" : 2200,
            "tags" : [
               "Senior Secondary",
               "beautiful campus"
            ],
            "rating" : "3.3"
         }
      },
      {
         "_index" : "schools",
         "_type" : "school",
         "_id" : "4",
         "_score" : 1.0,
         "_source" : {
            "name" : "City Best School",
            "description" : "ICSE",
            "street" : "West End",
            "city" : "Meerut",
            "state" : "UP",
            "zip" : "250002",
            "location" : [
               28.9926174,
               77.692485
            ],
            "fees" : 3500,
            "tags" : [
               "fully computerized"
            ],
            "rating" : "4.5"
         }
      }
   ]
 },
   "aggregations" : {
      "avg_fees" : {
         "value" : 2850.0
      }
   }
}

Cardinality Aggregation

此聚合给出特定字段的不同值的计数。

This aggregation gives the count of distinct values of a particular field.

POST /schools/_search?size=0
{
   "aggs":{
      "distinct_name_count":{"cardinality":{"field":"fees"}}
   }
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "took" : 2,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 2,
         "relation" : "eq"
      },
      "max_score" : null,
      "hits" : [ ]
   },
   "aggregations" : {
      "distinct_name_count" : {
         "value" : 2
      }
   }
}

Note − 基数的值为 2,因为费用中有两个不同的值。

Note − The value of cardinality is 2 because there are two distinct values in fees.

Extended Stats Aggregation

此聚合生成聚合文档中特定数字字段的所有统计信息。

This aggregation generates all the statistics about a specific numerical field in aggregated documents.

POST /schools/_search?size=0
{
   "aggs" : {
      "fees_stats" : { "extended_stats" : { "field" : "fees" } }
   }
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "took" : 8,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 2,
         "relation" : "eq"
      },
      "max_score" : null,
      "hits" : [ ]
   },
   "aggregations" : {
      "fees_stats" : {
         "count" : 2,
         "min" : 2200.0,
         "max" : 3500.0,
         "avg" : 2850.0,
         "sum" : 5700.0,
         "sum_of_squares" : 1.709E7,
         "variance" : 422500.0,
         "std_deviation" : 650.0,
         "std_deviation_bounds" : {
            "upper" : 4150.0,
            "lower" : 1550.0
         }
      }
   }
}

Max Aggregation

此聚合找到聚合文档中特定数字字段的最大值。

This aggregation finds the max value of a specific numeric field in aggregated documents.

POST /schools/_search?size=0
{
   "aggs" : {
   "max_fees" : { "max" : { "field" : "fees" } }
   }
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "took" : 16,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
  "hits" : {
      "total" : {
         "value" : 2,
         "relation" : "eq"
      },
      "max_score" : null,
      "hits" : [ ]
   },
   "aggregations" : {
      "max_fees" : {
         "value" : 3500.0
      }
   }
}

Min Aggregation

此聚合找到聚合文档中特定数字字段的最小值。

This aggregation finds the min value of a specific numeric field in aggregated documents.

POST /schools/_search?size=0
{
   "aggs" : {
      "min_fees" : { "min" : { "field" : "fees" } }
   }
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "took" : 2,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 2,
         "relation" : "eq"
      },
      "max_score" : null,
      "hits" : [ ]
   },
  "aggregations" : {
      "min_fees" : {
         "value" : 2200.0
      }
   }
}

Sum Aggregation

此聚合计算聚合文档中特定数字字段的总和。

This aggregation calculates the sum of a specific numeric field in aggregated documents.

POST /schools/_search?size=0
{
   "aggs" : {
      "total_fees" : { "sum" : { "field" : "fees" } }
   }
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "took" : 8,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 2,
         "relation" : "eq"
      },
      "max_score" : null,
      "hits" : [ ]
   },
   "aggregations" : {
      "total_fees" : {
         "value" : 5700.0
      }
   }
}

还有一些其他度量聚合在特殊情况下使用,它们包括用于地理位置的地理边界聚合和地理中心聚合。

There are some other metrics aggregations which are used in special cases like geo bounds aggregation and geo centroid aggregation for the purpose of geo location.

Stats Aggregations

多值度量聚合,它对从聚合文档中提取的数字值计算统计数据。

A multi-value metrics aggregation that computes stats over numeric values extracted from the aggregated documents.

POST /schools/_search?size=0
{
   "aggs" : {
      "grades_stats" : { "stats" : { "field" : "fees" } }
   }
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "took" : 2,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 2,
         "relation" : "eq"
      },
      "max_score" : null,
      "hits" : [ ]
   },
   "aggregations" : {
      "grades_stats" : {
         "count" : 2,
         "min" : 2200.0,
         "max" : 3500.0,
         "avg" : 2850.0,
         "sum" : 5700.0
      }
   }
}

Aggregation Metadata

您可以通过使用元标记在请求时添加一些有关聚合的数据,并可以在响应中获取它。

You can add some data about the aggregation at the time of request by using meta tag and can get that in response.

POST /schools/_search?size=0
{
   "aggs" : {
      "avg_fees" : { "avg" : { "field" : "fees" } ,
         "meta" :{
            "dsc" :"Lowest Fees This Year"
         }
      }
   }
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "took" : 0,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 2,
         "relation" : "eq"
      },
      "max_score" : null,
      "hits" : [ ]
   },
   "aggregations" : {
      "avg_fees" : {
         "meta" : {
            "dsc" : "Lowest Fees This Year"
         },
         "value" : 2850.0
      }
   }
}

Elasticsearch - Index APIs

这些 API 负责管理索引的所有方面,例如设置、别名、映射、索引模板。

These APIs are responsible for managing all the aspects of the index like settings, aliases, mappings, index templates.

Create Index

此 API 可帮助您创建索引。可以在用户向任何索引传递 JSON 对象时或在用户这么做之前自动创建索引。要创建索引,您只需发送一个包含设置、映射和别名(或只是一个不带正文的简单请求)的 PUT 请求。

This API helps you to create an index. An index can be created automatically when a user is passing JSON objects to any index or it can be created before that. To create an index, you just need to send a PUT request with settings, mappings and aliases or just a simple request without body.

PUT colleges

运行上述代码后,我们得到如下所示的输出:

On running the above code, we get the output as shown below −

{
   "acknowledged" : true,
   "shards_acknowledged" : true,
   "index" : "colleges"
}

我们还可以向上述命令添加一些设置:

We can also add some settings to the above command −

PUT colleges
{
  "settings" : {
      "index" : {
         "number_of_shards" : 3,
         "number_of_replicas" : 2
      }
   }
}

运行上述代码后,我们得到如下所示的输出:

On running the above code, we get the output as shown below −

{
   "acknowledged" : true,
   "shards_acknowledged" : true,
   "index" : "colleges"
}

Delete Index

此 API 可帮助您删除任何索引。您只需传递一个带有特定索引名称的删除请求。

This API helps you to delete any index. You just need to pass a delete request with the name of that particular Index.

DELETE /colleges

您只需使用 _all 或 * 即可删除所有索引。

You can delete all indices by just using _all or *.

Get Index

只需向一个或多个索引发送获取请求即可调用此 API。这将返回有关索引的信息。

This API can be called by just sending get request to one or more than one indices. This returns the information about index.

GET colleges

运行上述代码后,我们得到如下所示的输出:

On running the above code, we get the output as shown below −

{
   "colleges" : {
      "aliases" : {
         "alias_1" : { },
         "alias_2" : {
            "filter" : {
               "term" : {
                  "user" : "pkay"
               }
            },
            "index_routing" : "pkay",
            "search_routing" : "pkay"
         }
      },
      "mappings" : { },
      "settings" : {
         "index" : {
            "creation_date" : "1556245406616",
            "number_of_shards" : "1",
            "number_of_replicas" : "1",
            "uuid" : "3ExJbdl2R1qDLssIkwDAug",
            "version" : {
               "created" : "7000099"
            },
            "provided_name" : "colleges"
         }
      }
   }
}

通过使用 _all 或 * 您可以获取所有索引的信息。

You can get the information of all the indices by using _all or *.

Index Exist

只需向该索引发送获取请求即可确定索引是否存在。如果 HTTP 响应为 200,则表示索引存在;如果为 404,则表示索引不存在。

Existence of an index can be determined by just sending a get request to that index. If the HTTP response is 200, it exists; if it is 404, it does not exist.

HEAD colleges

运行上述代码后,我们得到如下所示的输出:

On running the above code, we get the output as shown below −

200-OK

Index Settings

您只需在 URL 最后追加 _settings 关键字即可获取索引设置。

You can get the index settings by just appending _settings keyword at the end of URL.

GET /colleges/_settings

运行上述代码后,我们得到如下所示的输出:

On running the above code, we get the output as shown below −

{
   "colleges" : {
      "settings" : {
         "index" : {
            "creation_date" : "1556245406616",
            "number_of_shards" : "1",
            "number_of_replicas" : "1",
            "uuid" : "3ExJbdl2R1qDLssIkwDAug",
            "version" : {
               "created" : "7000099"
            },
            "provided_name" : "colleges"
         }
      }
   }
}

Index Stats

此 API 可帮助您提取特定索引的统计信息。您只需发送一个带有索引 URL 和 _stats 关键字的获取请求(位于最后)。

This API helps you to extract statistics about a particular index. You just need to send a get request with the index URL and _stats keyword at the end.

GET /_stats

运行上述代码后,我们得到如下所示的输出:

On running the above code, we get the output as shown below −

………………………………………………
},
   "request_cache" : {
      "memory_size_in_bytes" : 849,
      "evictions" : 0,
      "hit_count" : 1171,
      "miss_count" : 4
   },
   "recovery" : {
      "current_as_source" : 0,
      "current_as_target" : 0,
      "throttle_time_in_millis" : 0
   }
} ………………………………………………

Flush

索引的刷新过程确保当前仅保存在事务日志中的任何数据也永久保存在 Lucene 中。这减少了恢复时间,因为在打开 Lucene 索引后,无需从事务日志重新对该数据建立索引。

The flush process of an index makes sure that any data that is currently only persisted in the transaction log is also permanently persisted in Lucene. This reduces recovery times as that data does not need to be reindexed from the transaction logs after the Lucene indexed is opened.

POST colleges/_flush

运行上述代码后,我们得到如下所示的输出:

On running the above code, we get the output as shown below −

{
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   }
}

Elasticsearch - Cat APIs

通常,各种 Elasticsearch API 的结果都以 JSON 格式显示。但是,JSON 并不总是易于阅读。所以,Elasticsearch 提供了 cat API 功能,它有助于采用更易于阅读和理解的格式来打印结果。cat API 中有各种各样的参数,它们服务于不同的用途,例如 - 用语 V 可使输出详细。

Usually the results from various Elasticsearch APIs are displayed in JSON format. But JSON is not easy to read always. So cat APIs feature is available in Elasticsearch helps in taking care of giving an easier to read and comprehend printing format of the results. There are various parameters used in cat API which server different purpose, for example - the term V makes the output verbose.

让我们在这章中更详细地了解 cat API。

Let us learn about cat APIs more in detail in this chapter.

Verbose

详细输出对 cat 命令的结果进行漂亮的显示。在下面给出的示例中,我们将获得集群中存在的各个指标的详细信息。

The verbose output gives a nice display of results of a cat command. In the example given below, we get the details of various indices present in the cluster.

GET /_cat/indices?v

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open schools RkMyEn2SQ4yUgzT6EQYuAA 1 1 2 1 21.6kb 21.6kb
yellow open index_4_analysis zVmZdM1sTV61YJYrNXf1gg 1 1 0 0 283b 283b
yellow open sensor-2018-01-01 KIrrHwABRB-ilGqTu3OaVQ 1 1 1 0 4.2kb 4.2kb
yellow open colleges 3ExJbdl2R1qDLssIkwDAug 1 1 0 0 283b 283b

Headers

h 参数,也称为标题,用于仅显示该命令中提到的那些列。

The h parameter, also called header, is used to display only those columns mentioned in the command.

GET /_cat/nodes?h=ip,port

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

127.0.0.1 9300

Sort

sort 命令接受查询字符串,该字符串可以按查询中指定的列对表格进行排序。默认排序为升序,但这可以通过向一列添加 :desc 来进行更改。

The sort command accepts query string which can sort the table by specified column in the query. The default sort is ascending but this can be changed by adding :desc to a column.

以下示例给出了按已编制索引模式排序且呈降序排列的模板的结果。

The below example, gives a result of templates arranged in descending order of the filed index patterns.

GET _cat/templates?v&s=order:desc,index_patterns

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

name index_patterns order version
.triggered_watches [.triggered_watches*] 2147483647
.watch-history-9 [.watcher-history-9*] 2147483647
.watches [.watches*] 2147483647
.kibana_task_manager [.kibana_task_manager] 0 7000099

Count

count 参数提供了整个集群中文档的总数。

The count parameter provides the count of total number of documents in the entire cluster.

GET /_cat/count?v

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

epoch timestamp count
1557633536 03:58:56 17809

Elasticsearch - Cluster APIs

群集 API 用于获取关于群集及其节点的信息并在其中进行更改。要调用此 API,我们需要指定节点名称、地址或 _local。

The cluster API is used for getting information about cluster and its nodes and to make changes in them. To call this API, we need to specify the node name, address or _local.

GET /_nodes/_local

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

………………………………………………
cluster_name" : "elasticsearch",
   "nodes" : {
      "FKH-5blYTJmff2rJ_lQOCg" : {
         "name" : "ubuntu",
         "transport_address" : "127.0.0.1:9300",
         "host" : "127.0.0.1",
         "ip" : "127.0.0.1",
         "version" : "7.0.0",
         "build_flavor" : "default",
         "build_type" : "tar",
         "build_hash" : "b7e28a7",
         "total_indexing_buffer" : 106502553,
         "roles" : [
            "master",
            "data",
            "ingest"
         ],
         "attributes" : {
………………………………………………

Cluster Health

此 API 用于通过附加“health”关键字来获取有关群集运行状况的状态。

This API is used to get the status on the health of the cluster by appending the ‘health’ keyword.

GET /_cluster/health

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "cluster_name" : "elasticsearch",
   "status" : "yellow",
   "timed_out" : false,
   "number_of_nodes" : 1,
   "number_of_data_nodes" : 1,
   "active_primary_shards" : 7,
   "active_shards" : 7,
   "relocating_shards" : 0,
   "initializing_shards" : 0,
   "unassigned_shards" : 4,
   "delayed_unassigned_shards" : 0,
   "number_of_pending_tasks" : 0,
   "number_of_in_flight_fetch" : 0,
   "task_max_waiting_in_queue_millis" : 0,
   "active_shards_percent_as_number" : 63.63636363636363
}

Cluster State

此 API 用于通过附加“state”关键字 URL 来获取有关群集的状态信息。状态信息包含版本、主节点、其他节点、路由表、元数据和块。

This API is used to get state information about a cluster by appending the ‘state’ keyword URL. The state information contains version, master node, other nodes, routing table, metadata and blocks.

GET /_cluster/state

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

………………………………………………
{
   "cluster_name" : "elasticsearch",
   "cluster_uuid" : "IzKu0OoVTQ6LxqONJnN2eQ",
   "version" : 89,
   "state_uuid" : "y3BlwvspR1eUQBTo0aBjig",
   "master_node" : "FKH-5blYTJmff2rJ_lQOCg",
   "blocks" : { },
   "nodes" : {
      "FKH-5blYTJmff2rJ_lQOCg" : {
      "name" : "ubuntu",
      "ephemeral_id" : "426kTGpITGixhEzaM-5Qyg",
      "transport
   }
………………………………………………

Cluster Stats

此 API 有助于通过使用“stats”关键字检索有关群集的统计信息。此 API 返回分片数、存储大小、内存使用情况、节点数、角色、操作系统和文件系统。

This API helps to retrieve statistics about cluster by using the ‘stats’ keyword. This API returns shard number, store size, memory usage, number of nodes, roles, OS, and file system.

GET /_cluster/stats

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

………………………………………….
"cluster_name" : "elasticsearch",
"cluster_uuid" : "IzKu0OoVTQ6LxqONJnN2eQ",
"timestamp" : 1556435464704,
"status" : "yellow",
"indices" : {
   "count" : 7,
   "shards" : {
      "total" : 7,
      "primaries" : 7,
      "replication" : 0.0,
      "index" : {
         "shards" : {
         "min" : 1,
         "max" : 1,
         "avg" : 1.0
      },
      "primaries" : {
         "min" : 1,
         "max" : 1,
         "avg" : 1.0
      },
      "replication" : {
         "min" : 0.0,
         "max" : 0.0,
         "avg" : 0.0
      }
………………………………………….

Cluster Update Settings

此 API 允许您通过使用“settings”关键字更新群集的设置。有两种类型的设置 - 持久性(在重启后应用)和瞬态(无法在完全群集重启后依然存在)。

This API allows you to update the settings of a cluster by using the ‘settings’ keyword. There are two types of settings − persistent (applied across restarts) and transient (do not survive a full cluster restart).

Node Stats

此 API 用于检索集群的一个或多个节点的统计信息。节点统计信息几乎与集群相同。

This API is used to retrieve the statistics of one more nodes of the cluster. Node stats are almost the same as cluster.

GET /_nodes/stats

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "_nodes" : {
      "total" : 1,
      "successful" : 1,
      "failed" : 0
   },
   "cluster_name" : "elasticsearch",
   "nodes" : {
      "FKH-5blYTJmff2rJ_lQOCg" : {
         "timestamp" : 1556437348653,
         "name" : "ubuntu",
         "transport_address" : "127.0.0.1:9300",
         "host" : "127.0.0.1",
         "ip" : "127.0.0.1:9300",
         "roles" : [
            "master",
            "data",
            "ingest"
         ],
         "attributes" : {
            "ml.machine_memory" : "4112797696",
            "xpack.installed" : "true",
            "ml.max_open_jobs" : "20"
         },
………………………………………………………….

Nodes hot_threads

此 API 可帮助您检索集群中每个节点上当前热门线程的相关信息。

This API helps you to retrieve information about the current hot threads on each node in cluster.

GET /_nodes/hot_threads

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

:::{ubuntu}{FKH-5blYTJmff2rJ_lQOCg}{426kTGpITGixhEzaM5Qyg}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=4112797696,
xpack.installed=true, ml.max_open_jobs=20}
 Hot threads at 2019-04-28T07:43:58.265Z, interval=500ms, busiestThreads=3,
ignoreIdleThreads=true:

Elasticsearch - Query DSL

在 Elasticsearch 中,通过基于 JSON 的查询执行搜索。查询由两个子句组成 −

In Elasticsearch, searching is carried out by using query based on JSON. A query is made up of two clauses −

  1. Leaf Query Clauses − These clauses are match, term or range, which look for a specific value in specific field.

  2. Compound Query Clauses − These queries are a combination of leaf query clauses and other compound queries to extract the desired information.

Elasticsearch 支持大量查询。查询以查询关键字开头,然后以 JSON 对象的形式在其中具有条件和过滤器。不同类型的查询已在下面进行描述。

Elasticsearch supports a large number of queries. A query starts with a query key word and then has conditions and filters inside in the form of JSON object. The different types of queries have been described below.

Match All Query

这是最基本的查询;它返回所有内容,并且每个对象的得分均为 1.0。

This is the most basic query; it returns all the content and with the score of 1.0 for every object.

POST /schools/_search
{
   "query":{
      "match_all":{}
   }
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "took" : 7,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 2,
         "relation" : "eq"
      },
      "max_score" : 1.0,
      "hits" : [
         {
            "_index" : "schools",
            "_type" : "school",
            "_id" : "5",
            "_score" : 1.0,
            "_source" : {
               "name" : "Central School",
               "description" : "CBSE Affiliation",
               "street" : "Nagan",
               "city" : "paprola",
               "state" : "HP",
               "zip" : "176115",
               "location" : [
                  31.8955385,
                  76.8380405
               ],
               "fees" : 2200,
               "tags" : [
                  "Senior Secondary",
                  "beautiful campus"
               ],
               "rating" : "3.3"
            }
         },
         {
            "_index" : "schools",
            "_type" : "school",
            "_id" : "4",
            "_score" : 1.0,
            "_source" : {
               "name" : "City Best School",
               "description" : "ICSE",
               "street" : "West End",
               "city" : "Meerut",
               "state" : "UP",
               "zip" : "250002",
               "location" : [
                  28.9926174,
                  77.692485
               ],
               "fees" : 3500,
               "tags" : [
                  "fully computerized"
               ],
               "rating" : "4.5"
            }
         }
      ]
   }
}

Full Text Queries

这些查询用于搜索全文文本,如章节或新闻文章。此查询根据与该特定索引或文档关联的分析器工作。在本章节中,我们将讨论不同类型的全文文本查询。

These queries are used to search a full body of text like a chapter or a news article. This query works according to the analyser associated with that particular index or document. In this section, we will discuss the different types of full text queries.

Match query

此查询将文本或短语与一个或多个字段的值进行匹配。

This query matches a text or phrase with the values of one or more fields.

POST /schools*/_search
{
   "query":{
      "match" : {
         "rating":"4.5"
      }
   }
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "took" : 44,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 1,
         "relation" : "eq"
      },
      "max_score" : 0.47000363,
      "hits" : [
         {
            "_index" : "schools",
            "_type" : "school",
            "_id" : "4",
            "_score" : 0.47000363,
            "_source" : {
               "name" : "City Best School",
               "description" : "ICSE",
               "street" : "West End",
               "city" : "Meerut",
               "state" : "UP",
               "zip" : "250002",
               "location" : [
                  28.9926174,
                  77.692485
               ],
               "fees" : 3500,
               "tags" : [
                  "fully computerized"
               ],
               "rating" : "4.5"
            }
         }
      ]
   }
}

Multi Match Query

此查询将文本或短语与多个字段进行匹配。

This query matches a text or phrase with more than one field.

POST /schools*/_search
{
   "query":{
      "multi_match" : {
         "query": "paprola",
         "fields": [ "city", "state" ]
      }
   }
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "took" : 12,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 1,
         "relation" : "eq"
      },
      "max_score" : 0.9808292,
      "hits" : [
         {
            "_index" : "schools",
            "_type" : "school",
            "_id" : "5",
            "_score" : 0.9808292,
            "_source" : {
               "name" : "Central School",
               "description" : "CBSE Affiliation",
               "street" : "Nagan",
               "city" : "paprola",
               "state" : "HP",
               "zip" : "176115",
               "location" : [
                  31.8955385,
                  76.8380405
               ],
               "fees" : 2200,
               "tags" : [
                  "Senior Secondary",
                  "beautiful campus"
               ],
               "rating" : "3.3"
            }
         }
      ]
   }
}

Query String Query

此查询使用查询解析器和 query_string 关键字。

This query uses query parser and query_string keyword.

POST /schools*/_search
{
   "query":{
      "query_string":{
         "query":"beautiful"
      }
   }
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "took" : 60,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
      "value" : 1,
      "relation" : "eq"
   },
………………………………….

Term Level Queries

这些查询主要处理结构化数据,如数字、日期和枚举。

These queries mainly deal with structured data like numbers, dates and enums.

POST /schools*/_search
{
   "query":{
      "term":{"zip":"176115"}
   }
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

……………………………..
hits" : [
   {
      "_index" : "schools",
      "_type" : "school",
      "_id" : "5",
      "_score" : 0.9808292,
      "_source" : {
         "name" : "Central School",
         "description" : "CBSE Affiliation",
         "street" : "Nagan",
         "city" : "paprola",
         "state" : "HP",
         "zip" : "176115",
         "location" : [
            31.8955385,
            76.8380405
         ],
      }
   }
]
…………………………………………..

Range Query

此查询用于查找在给定值范围内具有值的那些对象。为此,我们需要使用操作符,如 −

This query is used to find the objects having values between the ranges of values given. For this, we need to use operators such as −

  1. gte − greater than equal to

  2. gt − greater-than

  3. lte − less-than equal to

  4. lt − less-than

例如,请观察下面给出的代码 -

For example, observe the code given below −

POST /schools*/_search
{
   "query":{
      "range":{
         "rating":{
            "gte":3.5
         }
      }
   }
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "took" : 24,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 1,
         "relation" : "eq"
      },
      "max_score" : 1.0,
      "hits" : [
         {
            "_index" : "schools",
            "_type" : "school",
            "_id" : "4",
            "_score" : 1.0,
            "_source" : {
               "name" : "City Best School",
               "description" : "ICSE",
               "street" : "West End",
               "city" : "Meerut",
               "state" : "UP",
               "zip" : "250002",
               "location" : [
                  28.9926174,
                  77.692485
               ],
               "fees" : 3500,
               "tags" : [
                  "fully computerized"
               ],
               "rating" : "4.5"
            }
         }
      ]
   }
}

还存在其他类型的期限级别查询,例如 -

There exist other types of term level queries also such as −

  1. Exists query − If a certain field has non null value.

  2. Missing query − This is completely opposite to exists query, this query searches for objects without specific fields or fields having null value.

  3. Wildcard or regexp query − This query uses regular expressions to find patterns in the objects.

Compound Queries

这些查询是通过使用布尔运算符(如与运算符、或运算符、非运算符或不同索引的运算符或有函数调用等的运算符)合并到一起的不同查询的集合。

These queries are a collection of different queries merged with each other by using Boolean operators like and, or, not or for different indices or having function calls etc.

POST /schools/_search
{
   "query": {
      "bool" : {
         "must" : {
            "term" : { "state" : "UP" }
         },
         "filter": {
            "term" : { "fees" : "2200" }
         },
         "minimum_should_match" : 1,
         "boost" : 1.0
      }
   }
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "took" : 6,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 0,
         "relation" : "eq"
      },
      "max_score" : null,
      "hits" : [ ]
   }
}

Geo Queries

这些查询涉及地理位置和地理点。这些查询有助于找到学校或任何其他靠近某个位置的地理对象。您需要使用地理点数据类型。

These queries deal with geo locations and geo points. These queries help to find out schools or any other geographical object near to any location. You need to use geo point data type.

PUT /geo_example
{
   "mappings": {
      "properties": {
         "location": {
            "type": "geo_shape"
         }
      }
   }
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{  "acknowledged" : true,
   "shards_acknowledged" : true,
   "index" : "geo_example"
}

现在,我们将数据发布到上面创建的索引中。

Now we post the data in the index created above.

POST /geo_example/_doc?refresh
{
   "name": "Chapter One, London, UK",
   "location": {
      "type": "point",
      "coordinates": [11.660544, 57.800286]
   }
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "took" : 1,
   "timed_out" : false,
   "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
   },
   "hits" : {
      "total" : {
         "value" : 2,
         "relation" : "eq"
      },
      "max_score" : 1.0,
      "hits" : [
         "_index" : "geo_example",
         "_type" : "_doc",
         "_id" : "hASWZ2oBbkdGzVfiXHKD",
         "_score" : 1.0,
         "_source" : {
            "name" : "Chapter One, London, UK",
            "location" : {
               "type" : "point",
               "coordinates" : [
                  11.660544,
                  57.800286
               ]
            }
         }
      }
   }

Elasticsearch - Mapping

映射是存储在索引中的文档的概要。它定义了数据类型(如 geo_point 或 string)和文档中存在的字段的格式,以及控制动态添加的字段的映射的规则。

Mapping is the outline of the documents stored in an index. It defines the data type like geo_point or string and format of the fields present in the documents and rules to control the mapping of dynamically added fields.

PUT bankaccountdetails
{
   "mappings":{
      "properties":{
         "name": { "type":"text"}, "date":{ "type":"date"},
         "balance":{ "type":"double"}, "liability":{ "type":"double"}
      }
   }
 }

当我们运行上述代码时,会得到如下所示的响应:

When we run the above code, we get the response as shown below −

{
   "acknowledged" : true,
   "shards_acknowledged" : true,
   "index" : "bankaccountdetails"
}

Field Data Types

Elasticsearch 支持为文档中的字段设置不同类型的数据类型。此处详细讨论了用于在 Elasticsearch 中存储字段的数据类型。

Elasticsearch supports a number of different datatypes for the fields in a document. The data types used to store fields in Elasticsearch are discussed in detail here.

Core Data Types

这些数据类型为 text、keyword、date、long、double、boolean 或 ip 等基本数据类型,几乎所有系统都支持。

These are the basic data types such as text, keyword, date, long, double, boolean or ip, which are supported by almost all the systems.

Complex Data Types

这些数据类型组合了核心数据类型。其中包括数组、JSON 对象和嵌套数据类型。嵌套数据类型的示例如下所示:

These data types are a combination of core data types. These include array, JSON object and nested data type. An example of nested data type is shown below &minus

POST /tabletennis/_doc/1
{
   "group" : "players",
   "user" : [
      {
         "first" : "dave", "last" : "jones"
      },
      {
         "first" : "kevin", "last" : "morris"
      }
   ]
}

当我们运行上述代码时,会得到如下所示的响应:

When we run the above code, we get the response as shown below −

{
   "_index" : "tabletennis",
   "_type" : "_doc",
   "_id" : "1",
   _version" : 2,
   "result" : "updated",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 1,
   "_primary_term" : 1
}

另一个示例代码如下所示:

Another sample code is shown below −

POST /accountdetails/_doc/1
{
   "from_acc":"7056443341", "to_acc":"7032460534",
   "date":"11/1/2016", "amount":10000
}

当我们运行上述代码时,会得到如下所示的响应:

When we run the above code, we get the response as shown below −

{  "_index" : "accountdetails",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 1,
   "_primary_term" : 1
}

我们可以使用以下命令检查上述文档:

We can check the above document by using the following command −

GET /accountdetails/_mappings?include_type_name=false

Removal of Mapping Types

在 Elasticsearch 7.0.0 或更高版本中创建的索引不再接受 default 映射。在 Elasticsearch 6.x 中创建的索引将继续在 Elasticsearch 6.x 中像以前一样运行。Elasticsearch 7.0 中不再提供 API 类型的支持。

Indices created in Elasticsearch 7.0.0 or later no longer accept a default mapping. Indices created in 6.x will continue to function as before in Elasticsearch 6.x. Types are deprecated in APIs in 7.0.

Elasticsearch - Analysis

在搜索操作过程中处理查询时,分析模块会分析任何索引中的内容。该模块由分析器、分词器、分词器过滤器和字符过滤器组成。如果没有定义分析器,则默认情况下,内置的分析器、标记、过滤器和分词器会向分析模块注册。

When a query is processed during a search operation, the content in any index is analyzed by the analysis module. This module consists of analyzer, tokenizer, tokenfilters and charfilters. If no analyzer is defined, then by default the built in analyzers, token, filters and tokenizers get registered with analysis module.

在以下示例中,我们在未指定其他分析器时使用标准分析器。它将根据语法分析句子,并生成句子中使用的单词。

In the following example, we use a standard analyzer which is used when no other analyzer is specified. It will analyze the sentence based on the grammar and produce words used in the sentence.

POST _analyze
{
   "analyzer": "standard",
   "text": "Today's weather is beautiful"
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "tokens" : [
      {
         "token" : "today's",
         "start_offset" : 0,
         "end_offset" : 7,
         "type" : "",
         "position" : 0
      },
      {
         "token" : "weather",
         "start_offset" : 8,
         "end_offset" : 15,
         "type" : "",
         "position" : 1
      },
      {
         "token" : "is",
         "start_offset" : 16,
         "end_offset" : 18,
         "type" : "",
         "position" : 2
      },
      {
         "token" : "beautiful",
         "start_offset" : 19,
         "end_offset" : 28,
         "type" : "",
         "position" : 3
      }
   ]
}

Configuring the Standard analyzer

我们可以使用各种参数配置标准分析器来满足我们的自定义要求。

We can configure the standard analyser with various parameters to get our custom requirements.

在以下示例中,我们配置标准分析器让 max_token_length 为 5。

In the following example, we configure the standard analyzer to have a max_token_length of 5.

为此,我们首先使用具有 max_length_token 参数的分析器创建了一个索引。

For this, we first create an index with the analyser having max_length_token parameter.

PUT index_4_analysis
{
   "settings": {
      "analysis": {
         "analyzer": {
            "my_english_analyzer": {
               "type": "standard",
               "max_token_length": 5,
               "stopwords": "_english_"
            }
         }
      }
   }
}

然后我们使用文本应用分析器,如下所示。请注意标记如何不显示,因为它开头有两个空格,结尾有两个空格。对于单词“is”,它开头有一个空格,结尾有一个空格。考虑所有这些,它成为 4 个带有空格的字母,并且这不会使它成为一个单词。至少在开头或结尾处应该有一个非空格字符,才能使它成为一个单词。

Next we apply the analyser with a text as shown below. Please note how the token is does not appear as it has two spaces in the beginning and two spaces at the end. For the word “is”, there is a space at the beginning of it and a space at the end of it. Taking all of them, it becomes 4 letters with spaces and that does not make it a word. There should be a nonspace character at least at the beginning or at the end, to make it a word to be counted.

POST index_4_analysis/_analyze
{
   "analyzer": "my_english_analyzer",
   "text": "Today's weather is beautiful"
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "tokens" : [
      {
         "token" : "today",
         "start_offset" : 0,
         "end_offset" : 5,
         "type" : "",
         "position" : 0
      },
      {
         "token" : "s",
         "start_offset" : 6,
         "end_offset" : 7,
         "type" : "",
         "position" : 1
      },
      {
         "token" : "weath",
         "start_offset" : 8,
         "end_offset" : 13,
         "type" : "",
         "position" : 2
      },
      {
         "token" : "er",
         "start_offset" : 13,
         "end_offset" : 15,
         "type" : "",
         "position" : 3
      },
      {
         "token" : "beaut",
         "start_offset" : 19,
         "end_offset" : 24,
         "type" : "",
         "position" : 5
      },
      {
         "token" : "iful",
         "start_offset" : 24,
         "end_offset" : 28,
         "type" : "",
         "position" : 6
      }
   ]
}

各种分析器及其描述的列表在以下所示的表中给出 −

The list of various analyzers and their description are given in the table shown below −

S.No

Analyzer & Description

1

Standard analyzer (standard) stopwords and max_token_length setting can be set for this analyzer. By default, stopwords list is empty and max_token_length is 255.

2

Simple analyzer (simple) This analyzer is composed of lowercase tokenizer.

3

Whitespace analyzer (whitespace) This analyzer is composed of whitespace tokenizer.

4

Stop analyzer (stop) stopwords and stopwords_path can be configured. By default stopwords initialized to English stop words and stopwords_path contains path to a text file with stop words.

Tokenizers

标记化程序用于从 Elasticsearch 中的文本生成标记。可以通过考虑空格或其他标点符号将文本分解为标记。Elasticsearch 拥有大量内置标记化程序,可用于自定义分析器。

Tokenizers are used for generating tokens from a text in Elasticsearch. Text can be broken down into tokens by taking whitespace or other punctuations into account. Elasticsearch has plenty of built-in tokenizers, which can be used in custom analyzer.

下面展示了一个标记化程序的示例,该标记化程序在每遇到一个非字母字符时将文本分解为词条,但它也会将所有词条小写 −

An example of tokenizer that breaks text into terms whenever it encounters a character which is not a letter, but it also lowercases all terms, is shown below −

POST _analyze
{
   "tokenizer": "lowercase",
   "text": "It Was a Beautiful Weather 5 Days ago."
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "tokens" : [
      {
         "token" : "it",
         "start_offset" : 0,
         "end_offset" : 2,
         "type" : "word",
         "position" : 0
      },
      {
         "token" : "was",
         "start_offset" : 3,
         "end_offset" : 6,
         "type" : "word",
         "position" : 1
      },
      {
         "token" : "a",
         "start_offset" : 7,
         "end_offset" : 8,
         "type" : "word",
         "position" : 2
      },
      {
         "token" : "beautiful",
         "start_offset" : 9,
         "end_offset" : 18,
         "type" : "word",
         "position" : 3
      },
      {
         "token" : "weather",
         "start_offset" : 19,
         "end_offset" : 26,
         "type" : "word",
         "position" : 4
      },
      {
         "token" : "days",
         "start_offset" : 29,
         "end_offset" : 33,
         "type" : "word",
         "position" : 5
      },
      {
         "token" : "ago",
         "start_offset" : 34,
         "end_offset" : 37,
         "type" : "word",
         "position" : 6
      }
   ]
}

下表列出了标记化程序及其说明 −

A list of Tokenizers and their descriptions are shown here in the table given below −

S.No

Tokenizer & Description

1

Standard tokenizer (standard) This is built on grammar based tokenizer and max_token_length can be configured for this tokenizer.

2

Edge NGram tokenizer (edgeNGram) Settings like min_gram, max_gram, token_chars can be set for this tokenizer.

3

Keyword tokenizer (keyword) This generates entire input as an output and buffer_size can be set for this.

4

Letter tokenizer (letter) This captures the whole word until a non-letter is encountered.

Elasticsearch - Modules

Elasticsearch 由多个模块组成,这些模块负责其功能。这些模块有以下两种类型的设置:

Elasticsearch is composed of a number of modules, which are responsible for its functionality. These modules have two types of settings as follows −

  1. Static Settings − These settings need to be configured in config (elasticsearch.yml) file before starting Elasticsearch. You need to update all the concern nodes in the cluster to reflect the changes by these settings.

  2. Dynamic Settings − These settings can be set on live Elasticsearch.

我们将在本章的以下部分中讨论 Elasticsearch 的不同模块。

We will discuss the different modules of Elasticsearch in the following sections of this chapter.

Cluster-Level Routing and Shard Allocation

集群级别设置决定将分片分配给不同的节点以及重新分配分片来重新平衡集群。以下这些设置控制分片分配。

Cluster level settings decide the allocation of shards to different nodes and reallocation of shards to rebalance cluster. These are the following settings to control shard allocation.

Cluster-Level Shard Allocation

Setting

Possible value

Description

cluster.routing.allocation.enable

all

This default value allows shard allocation for all kinds of shards.

primaries

This allows shard allocation only for primary shards.

new_primaries

This allows shard allocation only for primary shards for new indices.

none

This does not allow any shard allocations.

cluster.routing.allocation .node_concurrent_recoveries

Numeric value (by default 2)

This restricts the number of concurrent shard recovery.

cluster.routing.allocation .node_initial_primaries_recoveries

Numeric value (by default 4)

This restricts the number of parallel initial primary recoveries.

cluster.routing.allocation .same_shard.host

Boolean value (by default false)

This restricts the allocation of more than one replica of the same shard in the same physical node.

indices.recovery.concurrent _streams

Numeric value (by default 3)

This controls the number of open network streams per node at the time of shard recovery from peer shards.

indices.recovery.concurrent _small_file_streams

Numeric value (by default 2)

This controls the number of open streams per node for small files having size less than 5mb at the time of shard recovery.

cluster.routing.rebalance.enable

all

This default value allows balancing for all kinds of shards.

primaries

This allows shard balancing only for primary shards.

replicas

This allows shard balancing only for replica shards.

none

This does not allow any kind of shard balancing.

cluster.routing.allocation .allow_rebalance

always

This default value always allows rebalancing.

indices_primaries _active

This allows rebalancing when all primary shards in cluster are allocated.

Indices_all_active

This allows rebalancing when all the primary and replica shards are allocated.

cluster.routing.allocation.cluster _concurrent_rebalance

Numeric value (by default 2)

This restricts the number of concurrent shard balancing in cluster.

cluster.routing.allocation .balance.shard

Float value (by default 0.45f)

This defines the weight factor for shards allocated on every node.

cluster.routing.allocation .balance.index

Float value (by default 0.55f)

This defines the ratio of the number of shards per index allocated on a specific node.

cluster.routing.allocation .balance.threshold

Non negative float value (by default 1.0f)

Disk-based Shard Allocation

Setting

Possible value

Description

cluster.routing.allocation.disk.threshold_enabled

Boolean value (by default true)

This enables and disables disk allocation decider.

cluster.routing.allocation.disk.watermark.low

String value(by default 85%)

This denotes maximum usage of disk; after this point, no other shard can be allocated to that disk.

cluster.routing.allocation.disk.watermark.high

String value (by default 90%)

This denotes the maximum usage at the time of allocation; if this point is reached at the time of allocation, then Elasticsearch will allocate that shard to another disk.

cluster.info.update.interval

String value (by default 30s)

This is the interval between disk usages checkups.

cluster.routing.allocation.disk.include_relocations

Boolean value (by default true)

This decides whether to consider the shards currently being allocated, while calculating disk usage.

Discovery

该模块帮助群集发现和维护其中所有节点的状态。当向群集中添加或从群集中删除某个节点时,群集状态将会发生更改。群集名称设置用于在不同群集之间创建逻辑差异。以下是一些帮助你使用云供应商提供的 API 的模块:

This module helps a cluster to discover and maintain the state of all the nodes in it. The state of cluster changes when a node is added or deleted from it. The cluster name setting is used to create logical difference between different clusters. There are some modules which help you to use the APIs provided by cloud vendors and those are as given below −

  1. Azure discovery

  2. EC2 discovery

  3. Google compute engine discovery

  4. Zen discovery

Gateway

该模块在整个群集重启过程中维护群集状态和切片数据。该模块的静态设置如下:

This module maintains the cluster state and the shard data across full cluster restarts. The following are the static settings of this module −

Setting

Possible value

Description

gateway.expected_nodes

numeric value (by default 0)

The number of nodes that are expected to be in the cluster for the recovery of local shards.

gateway.expected_master_nodes

numeric value (by default 0)

The number of master nodes that are expected to be in the cluster before start recovery.

gateway.expected_data_nodes

numeric value (by default 0)

The number of data nodes expected in the cluster before start recovery.

gateway.recover_after_time

String value (by default 5m)

This is the interval between disk usages checkups.

cluster.routing.allocation. disk.include_relocations

Boolean value (by default true)

This specifies the time for which the recovery process will wait to start regardless of the number of nodes joined in the cluster. gateway.recover_ after_nodes gateway.recover_after_master_nodes gateway.recover_after_data_nodes

HTTP

此模块管理 HTTP 客户端和 Elasticsearch API 之间的通信。该模块可通过将 http.enabled 的值更改为 false 来禁用。

This module manages the communication between HTTP client and Elasticsearch APIs. This module can be disabled by changing the value of http.enabled to false.

以下为控制此模块的设置(在 elasticsearch.yml 中配置):

The following are the settings (configured in elasticsearch.yml) to control this module −

S.No

Setting & Description

1

http.port This is a port to access Elasticsearch and it ranges from 9200-9300.

2

http.publish_port This port is for http clients and is also useful in case of firewall.

3

http.bind_host This is a host address for http service.

4

http.publish_host This is a host address for http client.

5

http.max_content_length This is the maximum size of content in an http request. Its default value is 100mb.

6

http.max_initial_line_length This is the maximum size of URL and its default value is 4kb.

7

http.max_header_size This is the maximum http header size and its default value is 8kb.

8

http.compression This enables or disables support for compression and its default value is false.

9

http.pipelinig This enables or disables HTTP pipelining.

10

http.pipelining.max_events This restricts the number of events to be queued before closing an HTTP request.

Indices

该模块维护设置,这些设置全局针对每个索引进行设置。以下设置主要与内存使用相关:

This module maintains the settings, which are set globally for every index. The following settings are mainly related to memory usage −

Circuit Breaker

此项用于防止操作导致 OutOfMemroyError。此项设置主要限制 JVM 堆大小。例如,indices.breaker.total.limit 设置的默认值是 JVM 堆的 70%。

This is used for preventing operation from causing an OutOfMemroyError. The setting mainly restricts the JVM heap size. For example, indices.breaker.total.limit setting, which defaults to 70% of JVM heap.

Fielddata Cache

这主要是用于在一个字段中聚合时使用的。建议有足够的内存分配给它。使用索引中用于字段数据缓存的内存量可以得到 indices.fielddata.cache.size 设置的控制。

This is used mainly when aggregating on a field. It is recommended to have enough memory to allocate it. The amount of memory used for the field data cache can be controlled using indices.fielddata.cache.size setting.

Node Query Cache

该内存用于缓存查询结果。该缓存使用最近最少使用 (LRU) 驱逐策略。Indices.queries.cahce.size 设置控制该缓存的内存大小。

This memory is used for caching the query results. This cache uses Least Recently Used (LRU) eviction policy. Indices.queries.cahce.size setting controls the memory size of this cache.

Indexing Buffer

该缓冲区存储索引中新创建的文档,并在缓冲区已满时刷新它们。诸如 indices.memory.index_buffer_size 的设置控制为该缓冲区分配的堆大小。

This buffer stores the newly created documents in the index and flushes them when the buffer is full. Setting like indices.memory.index_buffer_size control the amount of heap allocated for this buffer.

Shard Request Cache

该缓存用于存储每个分片的本地搜索数据。可以在创建索引期间启用缓存或通过发送 URL 参数禁用缓存。

This cache is used to store the local search data for every shard. Cache can be enabled during the creation of index or can be disabled by sending URL parameter.

Disable cache - ?request_cache = true
Enable cache "index.requests.cache.enable": true

Indices Recovery

它在恢复过程中控制资源。以下是设置−

It controls the resources during recovery process. The following are the settings −

Setting

Default value

indices.recovery.concurrent_streams

3

indices.recovery.concurrent_small_file_streams

2

indices.recovery.file_chunk_size

512kb

indices.recovery.translog_ops

1000

indices.recovery.translog_size

512kb

indices.recovery.compress

true

indices.recovery.max_bytes_per_sec

40mb

TTL Interval

生存期 (TTL) 间隔定义了一个文档的时间,之后将删除该文档。以下动态设置用于控制这一过程−

Time to Live (TTL) interval defines the time of a document, after which the document gets deleted. The following are the dynamic settings for controlling this process −

Setting

Default value

indices.ttl.interval

60s

indices.ttl.bulk_size

1000

Node

每个节点都可以选择为数据节点,也可以不为数据节点。您可以通过更改 node.data 设置来更改此属性。将该值设定为 false 来确定节点不是数据节点。

Each node has an option to be data node or not. You can change this property by changing node.data setting. Setting the value as false defines that the node is not a data node.

Elasticsearch - Index Modules

这些是为每个索引创建的模块,它们控制索引的设置和行为。例如,一个索引可以使用多少个分片,或者一个主分片可以为该索引拥有多少个副本等。索引设置有两种类型:

These are the modules which are created for every index and control the settings and behaviour of the indices. For example, how many shards an index can use or the number of replicas a primary shard can have for that index etc. There are two types of index settings −

  1. Static − These can be set only at index creation time or on a closed index.

  2. Dynamic − These can be changed on a live index.

Static Index Settings

下表显示了静态索引设置的列表:

The following table shows the list of static index settings −

Setting

Possible value

Description

index.number_of_shards

Defaults to 5, Maximum 1024

The number of primary shards that an index should have.

index.shard.check_on_startup

Defaults to false. Can be True

Whether or not shards should be checked for corruption before opening.

index.codec

LZ4 compression.

Type of compression used to store data.

index.routing_partition_size

1

The number of shards a custom routing value can go to.

index.load_fixed_bitset_filters_eagerly

false

Indicates whether cached filters are pre-loaded for nested queries

Dynamic Index Settings

下表列出了动态索引的设置列表−

The following table shows the list of dynamic index settings −

Setting

Possible value

Description

index.number_of_replicas

Defaults to 1

The number of replicas each primary shard has.

index.auto_expand_replicas

A dash delimited lower and upper bound (0-5)

Auto-expand the number of replicas based on the number of data nodes in the cluster.

index.search.idle.after

30seconds

How long a shard cannot receive a search or get request until it’s considered search idle.

index.refresh_interval

1 second

How often to perform a refresh operation, which makes recent changes to the index visible to search.

Elasticsearch - IngestNode

index.blocks.read_only

1 true/false

Set to true to make the index and index metadata read only, false to allow writes and metadata changes.

有时,我们需要在索引文档之前对其进行转换。例如,我们希望从文档中删除一个字段或重命名一个字段,然后进行索引。这由摄取节点处理。

Sometimes we need to transform a document before we index it. For instance, we want to remove a field from the document or rename a field and then index it. This is handled by Ingest node.

群集中每个节点都有摄取功能,但也可以将其自定义为仅由特定节点处理。

Every node in the cluster has the ability to ingest but it can also be customized to be processed only by specific nodes.

Steps Involved

摄取节点的工作涉及两个步骤 −

There are two steps involved in the working of the ingest node −

  1. Creating a pipeline

  2. Creating a doc

Create a Pipeline

首先创建包含处理器的管道,然后执行管道,如下所示 −

First creating a pipeline which contains the processors and then executing the pipeline, as shown below −

PUT _ingest/pipeline/int-converter
{
   "description": "converts the content of the seq field to an integer",
   "processors" : [
      {
         "convert" : {
            "field" : "seq",
            "type": "integer"
         }
      }
   ]
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "acknowledged" : true
}

Create a Doc

接下来,使用管道转换器创建文档。

Next we create a document using the pipeline converter.

PUT /logs/_doc/1?pipeline=int-converter
{
   "seq":"21",
   "name":"Tutorialspoint",
   "Addrs":"Hyderabad"
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "_index" : "logs",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 0,
   "_primary_term" : 1
}

接下来,我们使用 GET 命令搜索以上创建的文档,如下所示:-

Next we search for the doc created above by using the GET command as shown below −

GET /logs/_doc/1

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "logs",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 1,
   "_seq_no" : 0,
   "_primary_term" : 1,
   "found" : true,
   "_source" : {
      "Addrs" : "Hyderabad",
      "name" : "Tutorialspoint",
      "seq" : 21
   }
}

您可以在上面看到,21 已经变成一个整数。

You can see above that 21 has become an integer.

Without Pipeline

现在,我们不用管道创建文档。

Now we create a document without using the pipeline.

PUT /logs/_doc/2
{
   "seq":"11",
   "name":"Tutorix",
   "Addrs":"Secunderabad"
}
GET /logs/_doc/2

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "logs",
   "_type" : "_doc",
   "_id" : "2",
   "_version" : 1,
   "_seq_no" : 1,
   "_primary_term" : 1,
   "found" : true,
   "_source" : {
      "seq" : "11",
      "name" : "Tutorix",
      "Addrs" : "Secunderabad"
   }
}

您可以在上面看到,11 是一个字符串,没有使用管道。

You can see above that 11 is a string without the pipeline being used.

Elasticsearch - Managing Index Lifecycle

管理索引生命周期涉及基于碎片大小和性能要求等因素执行管理操作。指数生命周期管理 (ILM) API 让你可以自动化管理你的索引的方式。

Managing the index lifecycle involves performing management actions based on factors like shard size and performance requirements. The index lifecycle management (ILM) APIs enable you to automate how you want to manage your indices over time.

本章列出了 ILM API 及其用法。

This chapter gives a list of ILM APIs and their usage.

Policy Management APIs

API Name

Purpose

Example

Create lifecycle policy.

Creates a lifecycle policy. If the specified policy exists, the policy is replaced and the policy version is incremented.

PUT_ilm/policy/policy_id

Get lifecycle policy.

Returns the specified policy definition. Includes the policy version and last modified date. If no policy is specified, returns all defined policies.

GET_ilm/policy/policy_id

Delete lifecycle policy

Deletes the specified lifecycle policy definition. You cannot delete policies that are currently in use. If the policy is being used to manage any indices, the request fails and returns an error.

DELETE_ilm/policy/policy_id

Index Management APIs

API Name

Purpose

Example

Move to lifecycle step API.

Manually moves an index into the specified step and executes that step.

POST_ilm/move/index

Retry policy.

Sets the policy back to the step where the error occurred and executes the step.

POST index/_ilm/retry

Remove policy from index API edit.

Removes the assigned lifecycle policy and stops managing the specified index. If an index pattern is specified, removes the assigned policies from all matching indices.

POST index/_ilm/remove

Operation Management APIs

API Name

Purpose

Example

Get index lifecycle management status API.

Returns the status of the ILM plugin. The operation_mode field in the response shows one of three states: STARTED, STOPPING, or STOPPED.

GET /_ilm/status

Start index lifecycle management API.

Starts the ILM plugin if it is currently stopped. ILM is started automatically when the cluster is formed.

POST /_ilm/start

Stop index lifecycle management API.

Halts all lifecycle management operations and stops the ILM plugin. This is useful when you are performing maintenance on the cluster and need to prevent ILM from performing any actions on your indices.

POST /_ilm/stop

Explain lifecycle API.

Retrieves information about the index’s current lifecycle state, such as the currently executing phase, action, and step. Shows when the index entered each one, the definition of the running phase, and information about any failures.

GET index/_ilm/explain

Elasticsearch - SQL Access

它是一个组件,允许在实时中针对 Elasticsearch 执行类似于 SQL 的查询。你可以将 Elasticsearch SQL 视为一个翻译器,它既了解 SQL 也了解 Elasticsearch,并且可以轻松地按比例利用 Elasticsearch 的功能,实时读取和处理数据。

It is a component that allows SQL-like queries to be executed in real-time against Elasticsearch. You can think of Elasticsearch SQL as a translator, one that understands both SQL and Elasticsearch and makes it easy to read and process data in real-time, at scale by leveraging Elasticsearch capabilities.

Advantages of Elasticsearch SQL

  1. It has native integration − Each and every query is efficiently executed against the relevant nodes according to the underlying storage.

  2. No external parts − No need for additional hardware, processes, runtimes or libraries to query Elasticsearch.

  3. Lightweight and efficient − it embraces and exposes SQL to allow proper full-text search, in real-time.

Example

PUT /schoollist/_bulk?refresh
   {"index":{"_id": "CBSE"}}
   {"name": "GleanDale", "Address": "JR. Court Lane", "start_date": "2011-06-02",
   "student_count": 561}
   {"index":{"_id": "ICSE"}}
   {"name": "Top-Notch", "Address": "Gachibowli Main Road", "start_date": "1989-
   05-26", "student_count": 482}
   {"index":{"_id": "State Board"}}
   {"name": "Sunshine", "Address": "Main Street", "start_date": "1965-06-01",
   "student_count": 604}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

{
   "took" : 277,
   "errors" : false,
   "items" : [
      {
         "index" : {
            "_index" : "schoollist",
            "_type" : "_doc",
            "_id" : "CBSE",
            "_version" : 1,
            "result" : "created",
            "forced_refresh" : true,
            "_shards" : {
               "total" : 2,
               "successful" : 1,
               "failed" : 0
            },
            "_seq_no" : 0,
            "_primary_term" : 1,
            "status" : 201
         }
      },
      {
         "index" : {
            "_index" : "schoollist",
            "_type" : "_doc",
            "_id" : "ICSE",
            "_version" : 1,
            "result" : "created",
            "forced_refresh" : true,
            "_shards" : {
               "total" : 2,
               "successful" : 1,
               "failed" : 0
            },
            "_seq_no" : 1,
            "_primary_term" : 1,
            "status" : 201
         }
      },
      {
         "index" : {
            "_index" : "schoollist",
            "_type" : "_doc",
            "_id" : "State Board",
            "_version" : 1,
            "result" : "created",
            "forced_refresh" : true,
            "_shards" : {
               "total" : 2,
               "successful" : 1,
               "failed" : 0
            },
            "_seq_no" : 2,
            "_primary_term" : 1,
            "status" : 201
         }
      }
   ]
}

SQL Query

以下示例显示了构造 SQL 查询的方式 −

The following example shows how we frame the SQL query −

POST /_sql?format=txt
{
   "query": "SELECT * FROM schoollist WHERE start_date < '2000-01-01'"
}

在运行以上代码时,我们得到响应,如下所示:-

On running the above code, we get the response as shown below −

Address             | name          | start_date             | student_count
--------------------+---------------+------------------------+---------------
Gachibowli Main Road|Top-Notch      |1989-05-26T00:00:00.000Z|482
Main Street         |Sunshine       |1965-06-01T00:00:00.000Z|604

Note − 通过更改上述 SQL 查询,你可以获得不同的结果集。

Note − By changing the SQL query above, you can get different result sets.

Elasticsearch - Monitoring

为监控集群的运行状况,监控功能会从每个节点收集指标,并将其存储在 Elasticsearch 索引中。与 Elasticsearch 中的监控相关的所有设置都必须设置在每个节点的 elasticsearch.yml 文件中,或者(如可能的情况下)设置在动态集群设置中。

To monitor the health of the cluster, the monitoring feature collects metrics from each node and stores them in Elasticsearch Indices. All settings associated with monitoring in Elasticsearch must be set in either the elasticsearch.yml file for each node or, where possible, in the dynamic cluster settings.

为了开始监控,我们需要检查集群设置,这可以使用以下方式完成 −

In order to start monitoring, we need to check the cluster settings, which can be done in the following way −

GET _cluster/settings
{
   "persistent" : { },
   "transient" : { }
}

堆栈中的每个组件负责监控自身,然后将这些文档转发给 Elasticsearch 生成集群进行路由和索引编制(存储)。Elasticsearch 中的路由和索引编制过程由所谓的收集器和导出器处理。

Each component in the stack is responsible for monitoring itself and then forwarding those documents to the Elasticsearch production cluster for both routing and indexing (storage). The routing and indexing processes in Elasticsearch are handled by what are called collectors and exporters.

Collectors

收集器每隔一次收集时间间隔运行一次,以从 Elasticsearch 中它选择要监控的公共 API 处获取数据。在完成数据收集后,将数据批量交给导出器以发送到监控集群。

Collector runs once per each collection interval to obtain data from the public APIs in Elasticsearch that it chooses to monitor. When the data collection is finished, the data is handed in bulk to the exporters to be sent to the monitoring cluster.

每种收集的数据类型只有一个收集器。每个收集器可以创建零个或多个监控文档。

There is only one collector per data type gathered. Each collector can create zero or more monitoring documents.

Exporters

导出器从任何弹性堆栈源获取收集的数据,并将其路由到监控集群。可以配置多个导出器,但常规且默认的设置是使用单个导出器。导出器可以在节点和集群级别进行配置。

Exporters take data collected from any Elastic Stack source and route it to the monitoring cluster. It is possible to configure more than one exporter, but the general and default setup is to use a single exporter. Exporters are configurable at both the node and cluster level.

Elasticsearch 中有两种类型的导出器 −

There are two types of exporters in Elasticsearch −

  1. local − This exporter routes data back into the same cluster

  2. http − The preferred exporter, which you can use to route data into any supported Elasticsearch cluster accessible via HTTP.

在导出器可以路由监控数据之前,它们必须设置某些 Elasticsearch 资源。这些资源包括模板和采集管道

Before exporters can route monitoring data, they must set up certain Elasticsearch resources. These resources include templates and ingest pipelines

Elasticsearch - Rollup Data

汇总作业是一项周期性任务,它汇总索引模式指定的索引中的数据,然后将其汇总到新索引中。在以下示例中,我们创建了一个名为传感器的索引,其中包含不同的日期时间戳。然后,我们会创建一个汇总作业,以便使用 cron 作业定期汇总来自这些索引的数据。

A rollup job is a periodic task that summarizes data from indices specified by an index pattern and rolls it into a new index. In the following example, we create an index named sensor with different date time stamps. Then we create a rollup job to rollup the data from these indices periodically using cron job.

PUT /sensor/_doc/1
{
   "timestamp": 1516729294000,
   "temperature": 200,
   "voltage": 5.2,
   "node": "a"
}

运行以上代码时,我们得到以下结果:-

On running the above code, we get the following result −

{
   "_index" : "sensor",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 0,
   "_primary_term" : 1
}

现在,添加第二个文档,并依次为其他文档添加文档。

Now, add a second document and so on for other documents as well.

PUT /sensor-2018-01-01/_doc/2
{
   "timestamp": 1413729294000,
   "temperature": 201,
   "voltage": 5.9,
   "node": "a"
}

Create a Rollup Job

PUT _rollup/job/sensor
{
   "index_pattern": "sensor-*",
   "rollup_index": "sensor_rollup",
   "cron": "*/30 * * * * ?",
   "page_size" :1000,
   "groups" : {
      "date_histogram": {
         "field": "timestamp",
         "interval": "60m"
      },
      "terms": {
         "fields": ["node"]
      }
   },
   "metrics": [
      {
         "field": "temperature",
         "metrics": ["min", "max", "sum"]
      },
      {
         "field": "voltage",
         "metrics": ["avg"]
      }
   ]
}

cron 参数控制作业的激活时间和频率。当汇总作业的 cron 计划触发时,它将从上次激活后的中断位置开始汇总

The cron parameter controls when and how often the job activates. When a rollup job’s cron schedule triggers, it will begin rolling up from where it left off after the last activation

在该作业运行并处理了一些数据之后,我们可以使用 DSL 查询执行一些搜索。

After the job has run and processed some data, we can use the DSL Query to do some searching.

GET /sensor_rollup/_rollup_search
{
   "size": 0,
   "aggregations": {
      "max_temperature": {
         "max": {
            "field": "temperature"
         }
      }
   }
}

Elasticsearch - Frozen Indices

频繁搜索的索引保存在内存中,因为重建它们需要时间,并且有助于高效搜索。另一方面,可能有一些我们很少访问的索引。这些索引不需要占用内存,并且可以在需要时重新构建。这种索引被称为冻结索引。

The indices that are searched frequently are held in memory because it takes time to rebuild them and help in an efficient search. On the other hand, there may be indices which we rarely access. Those indices need not occupy the memory and can be re-build when they are needed. Such indices are known as frozen indices.

Elasticsearch 在每次搜索冻结索引的每个分片时都会构建分片的瞬态数据结构,并在搜索完成后立即丢弃这些数据结构。由于 Elasticsearch 不在内存中维护这些瞬态数据结构,因此冻结索引消耗的堆比普通索引少得多。这允许比其他可能的情况更高的磁盘到堆的比率。

Elasticsearch builds the transient data structures of each shard of a frozen index each time that shard is searched and discards these data structures as soon as the search is complete. Because Elasticsearch does not maintain these transient data structures in memory, frozen indices consume much less heap than the normal indices. This allows for a much higher disk-to-heap ratio than would otherwise be possible.

Example for Freezing and Unfreezing

以下示例冻结和解冻索引 -

The following example freezes and unfreezes an index −

POST /index_name/_freeze
POST /index_name/_unfreeze

预期对冻结索引的搜索将缓慢执行。冻结索引不适用于高搜索负载。冻结索引的搜索可能需要几秒或几分钟才能完成,即使在索引未冻结时,相同的搜索可以在几毫秒内完成。

Searches on frozen indices are expected to execute slowly. Frozen indices are not intended for high search load. It is possible that a search of a frozen index may take seconds or minutes to complete, even if the same searches completed in milliseconds when the indices were not frozen.

Searching a Frozen Index

每个节点同时加载的冻结索引数量受 search_throttled 线程池中线程数量的限制,默认值为 1。要包含冻结索引,必须使用查询参数执行搜索请求 - ignore_throttled=false。

The number of concurrently loaded frozen indices per node is limited by the number of threads in the search_throttled threadpool, which is 1 by default. To include frozen indices, a search request must be executed with the query parameter − ignore_throttled=false.

GET /index_name/_search?q=user:tpoint&ignore_throttled=false

Monitoring Frozen Indices

冻结索引是使用搜索限制和内存高效分片实现的普通索引。

Frozen indices are ordinary indices that use search throttling and a memory efficient shard implementation.

GET /_cat/indices/index_name?v&h=i,sth

Elasticsearch - Testing

Elasticsearch 提供了一个 jar 文件,可以将其添加到任何 java IDE,并可以用来测试与 Elasticsearch 相关的代码。使用 Elasticsearch 提供的框架可以执行一系列测试。在本章中,我们将详细讨论这些测试 −

Elasticsearch provides a jar file, which can be added to any java IDE and can be used to test the code which is related to Elasticsearch. A range of tests can be performed by using the framework provided by Elasticsearch. In this chapter, we will discuss these tests in detail −

  1. Unit testing

  2. Integration testing

  3. Randomized testing

Prerequisites

要开始测试,你需要将 Elasticsearch testing 依赖项添加到你的程序中。你可以为此目的使用 maven,并在 pom.xml 中添加以下内容。

To start with testing, you need to add the Elasticsearch testing dependency to your program. You can use maven for this purpose and can add the following in pom.xml.

<dependency>
   <groupId>org.elasticsearch</groupId>
   <artifactId>elasticsearch</artifactId>
   <version>2.1.0</version>
</dependency>

EsSetup 已被初始化为启动和停止 Elasticsearch 节点,并创建索引。

EsSetup has been initialized to start and stop Elasticsearch node and also to create indices.

EsSetup esSetup = new EsSetup();

带有 createIndex 的 esSetup.execute() 函数将创建索引,你需要指定设置、类型和数据。

esSetup.execute() function with createIndex will create the indices, you need to specify the settings, type and data.

Unit Testing

单元测试是使用 JUnit 和 Elasticsearch 测试框架进行的。可以使用 Elasticsearch 类创建节点和索引,并且可以在测试方法中使用它们来执行测试。ESTestCase 和 ESTokenStreamTestCase 类用于此测试。

Unit test is carried out by using JUnit and Elasticsearch test framework. Node and indices can be created using Elasticsearch classes and in test method can be used to perform the testing. ESTestCase and ESTokenStreamTestCase classes are used for this testing.

Integration Testing

集成测试在集群中使用多个节点。ESIntegTestCase 类用于此测试。有很多方法可以简化测试用例的准备工作。

Integration testing uses multiple nodes in a cluster. ESIntegTestCase class is used for this testing. There are various methods which make the job of preparing a test case easier.

S.No

Method & Description

1

refresh() All the indices in a cluster are refreshed

2

ensureGreen() Ensures a green health cluster state

3

ensureYellow() Ensures a yellow health cluster state

4

createIndex(name) Create index with the name passed to this method

5

flush() All indices in cluster are flushed

6

flushAndRefresh() flush() and refresh()

7

indexExists(name) Verifies the existence of specified index

8

clusterService() Returns the cluster service java class

9

cluster() Returns the test cluster class

Test Cluster Methods

S.No

Method & Description

1

ensureAtLeastNumNodes(n) Ensures minimum number of nodes up in a cluster is more than or equal to specified number.

2

ensureAtMostNumNodes(n) Ensures maximum number of nodes up in a cluster is less than or equal to specified number.

3

stopRandomNode() To stop a random node in a cluster

4

stopCurrentMasterNode() To stop the master node

5

stopRandomNonMaster() To stop a random node in a cluster, which is not a master node.

6

buildNode() Create a new node

7

startNode(settings) Start a new node

8

nodeSettings() Override this method for changing node settings.

Accessing Clients

使用一个客户端访问群集中的不同节点并执行某些操作。使用 ESIntegTestCase.client() 方法获取随机客户端。Elasticsearch 还提供其他方法访问客户端,而这些方法可以通过 ESIntegTestCase.internalCluster() 方法访问。

A client is used to access different nodes in a cluster and carry out some action. ESIntegTestCase.client() method is used for getting a random client. Elasticsearch offers other methods also to access client and those methods can be accessed using ESIntegTestCase.internalCluster() method.

S.No

Method & Description

1

iterator() This helps you to access all the available clients.

2

masterClient() This returns a client, which is communicating with master node.

3

nonMasterClient() This returns a client, which is not communicating with master node.

4

clientNodeClient() This returns a client currently up on client node.

Randomized Testing

此项测试用于使用所有可能数据测试用户的代码,以便未来使用任何类型的数据时不会出现故障。随机数据是执行此类测试的最佳选项。

This testing is used to test the user’s code with every possible data, so that there will be no failure in future with any type of data. Random data is the best option to carry out this testing.

Generating Random Data

在此项测试中,Random 类由 RandomizedTest 提供的实例实例化,并提供多种方法来获取不同类型的数据。

In this testing, the Random class is instantiated by the instance provided by RandomizedTest and offers many methods for getting different types of data.

Method

Return value

getRandom()

Instance of random class

randomBoolean()

Random boolean

randomByte()

Random byte

randomShort()

Random short

randomInt()

Random integer

randomLong()

Random long

randomFloat()

Random float

randomDouble()

Random double

randomLocale()

Random locale

randomTimeZone()

Random time zone

randomFrom()

Random element from array

Assertions

ElasticsearchAssertions 和 ElasticsearchGeoAssertions 类包含断言,这些断言用于在测试时执行一些常见检查。例如,观察此处给出的代码 −

ElasticsearchAssertions and ElasticsearchGeoAssertions classes contain assertions, which are used for performing some common checks at the time of testing. For example, observe the code given here −

SearchResponse seearchResponse = client().prepareSearch();
assertHitCount(searchResponse, 6);
assertFirstHit(searchResponse, hasId("6"));
assertSearchHits(searchResponse, "1", "2", "3", "4",”5”,”6”);

Elasticsearch - Kibana Dashboard

Kibana 仪表盘是可视化和搜索的集合。你可以布置、调整和编辑仪表盘内容,然后保存仪表盘以供共享。在本章中,我们将了解如何创建和编辑仪表盘。

A Kibana dashboard is a collection of visualizations and searches. You can arrange, resize, and edit the dashboard content and then save the dashboard so you can share it. In this chapter, we will see how to create and edit a dashboard.

Dashboard Creation

在 Kibana 首页中,从左侧控制栏中选择仪表盘选项,如下所示。这将提示你创建一个新仪表盘。

From the Kibana Homepage, select the dashboard option from the left control bars as shown below. This will prompt you to create a new dashboard.

dashboard creation

若要向仪表盘添加可视化,我们选择“添加”菜单,然后从提供的预置可视化中选择。我们从列表中选择了以下可视化选项。

To Add visualizations to the dashboard, we choose the menu Add and the select from the pre-built visualizations available. We chose the following visualization options from the list.

add new visualization

在选择上述可视化后,我们得到的仪表盘如下所示。我们稍后可以添加并编辑仪表盘,更改元素和添加新元素。

On selecting the above visualizations, we get the dashboard as shown here. We can later add and edit the dashboard for changing the elements and adding the new elements.

edit sales dashboard

Inspecting Elements

我们可以通过选择可视化面板菜单并选择 Inspect 来检查仪表盘元素。这将展示元素背后的数据,这些数据也可以下载。

We can inspect the Dashboard elements by choosing the visualizations panel menu and selecting Inspect. This will bring out the data behind the element which also can be downloaded.

inspecting elements

Sharing Dashboard

我们可以通过选择共享菜单并选择获取超链接的选项来共享仪表盘,如下所示:

We can share the dashboard by choosing the share menu and selecting the option to get a hyperlink as shown below −

sharing dashboard

Elasticsearch - Filtering by Field

Kibana 主页中可用的发现功能使我们能够从各个角度探索数据集。你可以搜索和筛选选定索引模式的数据。数据通常以一段时间内值的分布形式存在。

The discover functionality available in Kibana home page allows us to explore the data sets from various angles. You can search and filter data for the selected index patterns. The data is usually available in form of distribution of values over a period of time.

要探索电子商务数据样本,我们会单击 Discover 图标,如下面图片所示。这将会调出数据和图表。

To explore the ecommerce data sample, we click on the Discover icon as shown in the picture below. This will bring up the data along with the chart.

discover

Filtering by Time

要按特定时间间隔筛选数据,我们会使用时间筛选选项,如下所示。默认情况下,筛选器设置为 15 分钟。

To filter out data by specific time interval we use the time filter option as shown below. By default, the filter is set at 15 minutes.

filtering by time

Filtering by Fields

还可以使用 Add Filter 选项按字段筛选数据集,如下所示。在此,我们添加一个或多个字段并在应用筛选器后获取相应的结果。在我们的示例中我们选择字段 day_of_week ,然后选择该字段的操作符为 is ,值则为 Sunday

The data set can also be filtered by fields using the Add Filter option as shown below. Here we add one or more fields and get the corresponding result after the filters are applied. In our example we choose the field day_of_week and then the operator for that field as is and value as Sunday.

filtering by fields

接下来,我们会单击使用以上筛选条件保存。应用了筛选条件的结果集如下所示。

Next, we click Save with above filter conditions. The result set containing the filter conditions applied is shown below.

edit filter conditions

Elasticsearch - Data Tables

数据表是一种用于显示已组合聚合的原始数据的可视化类型。数据表可用于显示各种类型的聚合。为了创建数据表,我们应该仔细了解这里讨论的步骤。

The data table is type of visualization that is used to display the raw data of a composed aggregation. There are various types of aggregations that are presented by using Data tables. In order to create a Data Table, we should go through the steps that are discussed here in detail.

Visualize

在 Kibana 主屏幕中,我们找到了可视化名称选项,可视化名称选项使我们能够从 Elasticsearch 中存储的指标创建可视化和聚合。下图显示了此选项。

In Kibana Home screen we find the option name Visualize which allows us to create visualization and aggregations from the indices stored in Elasticsearch. The following image shows the option.

visualize home page

Select Data Table

接下来,我们在各种可用的可视化选项中选择数据表选项。该选项显示在以下图像 &miuns;

Next, we select the Data Table option from among the various visualization options available. The option is shown in the following image &miuns;

new visualize

Select Metrics

然后,我们选择创建数据表可视化所需指标。此选择决定我们将要使用的聚合类型。为此,我们从电子商务数据集选择以下所示的特定字段。

We then select the metrics needed for creating the data table visualization. This choice decides the type of aggregation we are going to use. We select the specific fields shown below from the ecommerce data set for this.

kibana sample data ecommerce

在对数据表运行上述配置后,我们得到的结果如下图所示 −

On running the above configuration for Data Table, we get the result as shown in the image here −

result of kibana sample data

Elasticsearch - Region Maps

区域地图在地理地图中显示指标。它对于观察不同地理区域中具有不同强度的锚定数据非常有用。较暗的阴影通常表示较高的值,较浅的阴影表示较低的值。

Region Maps show metrics on a geographic Map. It is useful in looking at the data anchored to different geographic regions with varying intensity. The darker shades usually indicate higher values and the lighter shades indicate lower values.

创建此可视化的步骤如下所示 −

The steps to create this visualization are as explained in detail as follows −

Visualize

在此步骤中,我们转到 Kibana 主屏幕左侧栏中可用的“可视化”按钮,然后选择添加新可视化的选项。

In this step we go to the visualize button available in the left bar of the Kibana Home screen and then choosing the option to add a new Visualization.

以下屏幕显示了如何选择区域地图选项。

The following screen shows how we choose the region Map option.

region maps visualize

Choose the Metrics

下一个屏幕提示我们选择将在创建区域地图时使用的指标。在此,我们将平均价格选为指标,将存储桶中将用于创建可视化的字段选为 country_iso_code。

The next screen prompts us for choosing the metrics which will be used in creating the Region Map. Here we choose the Average price as the metric and country_iso_code as the field in the bucket which will be used in creating the visualization.

choose the metrics

以下最终结果显示了我们在应用选择后得到的区域地图。请注意标签中提到的颜色阴影及其值。

The final result below shows the Region Map once we apply the selection. Please note the shades of the colour and their values mentioned in the label.

region maps

Elasticsearch - Pie Charts

饼图是最简单、最著名的可视化工具之一。它将数据表示为不同颜色的圆圈切片。可以将标签与百分比数据值一起显示在圆圈内。圆圈也可以变成甜甜圈的形状。

Pie charts are one of the simplest and famous visualization tools. It represents the data as slices of a circle each coloured differently. The labels along with the percentage data values can be presented along with the circle. The circle can also take the shape of a donut.

Visualize

在 Kibana 主屏幕中,我们找到可视化选项名称,该选项允许我们从 Elasticsearch 中存储的索引创建可视化和聚合。我们选择添加一个新的可视化,并选择如下所示的选项,即饼图。

In Kibana Home screen, we find the option name Visualize which allows us to create visualization and aggregations from the indices stored in Elasticsearch. We choose to add a new visualization and select pie chart as the option shown below.

pie charts visualize

Choose the Metrics

下一个屏幕提示我们选择用于创建饼图的度量标准。在这里,我们选择基准价格计数作为度量标准,并将存储桶聚合选择为直方图。另外,将最小间隔选择为 20。因此,价格将以 20 为范围的值块显示。

The next screen prompts us for choosing the metrics which will be used in creating the Pie Chart. Here we choose the count of base unit price as the metric and Bucket Aggregation as histogram. Also, the minimum interval is chosen as 20. So, the prices will be displayed as blocks of values with 20 as a range.

pie charts metrics

在应用选择后,以下结果显示饼状图。请注意标签中提到的颜色阴影及其值。

The result below shows the pie chart after we apply the selection. Please note the shades of the colour and their values mentioned in the label.

pie charts

Pie Chart Options

移动到饼状图下的选项选项卡上,我们可以看到各种配置选项,以更改饼状图中数据的显示外观和排列。在以下示例中,饼状图显示为甜甜圈状,且标签显示在顶部。

On moving to the options tab under pie chart we can see various configuration options to change the look as well as the arrangement of data display in the pie chart. In the following example, the pie chart appears as donut and the labels appear at the top.

pie charts options

Elasticsearch - Area and Bar Charts

面积图是折线图的扩展,其中折线图和坐标轴之间的面积会突出显示为某种颜色。条形图表示成一系列值的组织数据,然后沿坐标轴绘制。它可以由水平条或垂直条组成。

An area chart is an extension of line chart where the area between the line chart and the axes is highlighted with some colours. A bar chart represents data organized into a range of values and then plotted against the axes. It can consist of either horizontal bars or vertical bars.

在本章中,我们将看到使用 Kibana 创建的所有这三种类型的图形。正如前面章节中所讨论的,我们将继续使用电子商务索引中的数据。

In this chapter we will see all these three types of graphs that is created using Kibana. As discussed in earlier chapters we will continue to use the data in the ecommerce index.

Area Chart

在 Kibana 主屏幕中,我们找到了可视化名称选项,可视化名称选项使我们能够从 Elasticsearch 中存储的指标创建可视化和聚合。我们选择添加一个新的可视化,并选择下图所示的图像中的面积图选项。

In Kibana Home screen, we find the option name Visualize which allows us to create visualization and aggregations from the indices stored in Elasticsearch. We choose to add a new visualization and select Area Chart as the option shown in the image given below.

area charts visualize

Choose the Metrics

下一个屏幕会提示我们选择用于创建面积图的指标。此处,我们选择总和作为聚合指标的类型。然后我们选择 total_quantity 字段作为要作为指标使用的字段。在 X 轴上,我们选择了 order_date 字段并以 5 的大小用给定的指标拆分序列。

The next screen prompts us for choosing the metrics which will be used in creating the Area Chart. Here we choose the sum as the type of aggregation metric. Then we choose total_quantity field as the field to be used as metric. On the X-axis, we chose the order_date field and split the series with the given metric in a size of 5.

area charts metrics

运行上述配置之后,我们将得到如下所示的面积图作为输出 −

On running the above configuration, we get the following area chart as the output −

area charts output

Horizontal Bar Chart

同样,对于水平条形图,我们从 Kibana 主屏幕选择新的可视化效果,然后选择水平条形图选项。然后,我们选择如下图所示的指标。在此处,我们选择求和作为名为产品数量的字段的聚合。然后,我们选择带有日期直方图的存储桶作为字段订单日期。

Similarly, for the Horizontal bar chart we choose new visualization from Kibana Home screen and choose the option for Horizontal Bar. Then we choose the metrics as shown in the image below. Here we choose Sum as the aggregation for the filed named product quantity. Then we choose buckets with date histogram for the field order date.

horizontal bar chart

在运行上述配置时,我们可看到一个如下所示的水平条形图 −

On running the above configuration, we can see a horizontal bar chart as shown below −

configuration horizontal bar chart

Vertical Bar Chart

对于垂直条形图,我们从 Kibana 主屏幕选择新的可视化效果,然后选择垂直条形图选项。然后,我们选择如下图所示的指标。

For the vertical bar chart, we choose new visualization from Kibana Home screen and choose the option for Vertical Bar. Then we choose the metrics as shown in the image below.

在此处,我们选择求和作为名为产品数量的字段的聚合。然后,我们选择带有日期直方图的存储桶作为字段订单日期,间隔为每周。

Here we choose Sum as the aggregation for the field named product quantity. Then we choose buckets with date histogram for the field order date with a weekly interval.

vertical bar chart

在运行上述配置时,将生成如下图所示的图表 −

On running the above configuration, a chart will be generated as shown below −

configuration of vertical bar

Elasticsearch - Time Series

时间序列是对特定时间序列中数据序列的表示。例如,从当月第一天到最后一天的每一天的数据。数据点之间的间隔保持恒定。任何包含时间组件的数据集都可以表示为时间序列。

Time series is a representation of sequence of data in a specific time sequence. For example, the data for each day starting from first day of the month to the last day. The interval between the data points remains constant. Any data set which has a time component in it can be represented as a time series.

在本章中,我们将使用示例电子商务数据集,并绘制每天订单数,以创建一个时间序列。

In this chapter, we will use the sample e-commerce data set and plot the count of the number of orders for each day to create a time series.

time series visualize

Choose Metrics

首先,我们需要选择将用于创建时间序列的索引模式、数据字段和间隔。从样本电子商务数据集中,我们将 order_date 选为字段,将 1d 选为间隔。我们使用 Panel Options 选项卡进行这些选择。我们还将此选项卡中的其他值保留为默认值,以获取时间序列的默认颜色和格式。

First, we choose the index pattern, data field and interval which will be used for creating the time series. From the sample ecommerce data set we choose order_date as the field and 1d as the interval. We use the Panel Options tab to make these choices. Also we leave the other values in this tab as default to get a default colour and format for the time series.

panel options

Data 选项卡中,我们选择 count 作为聚合选项,将 group by 选项设为 everything,并为时间序列图表设置标签。

In the Data tab, we choose count as the aggregation option, group by option as everything and put a label for the time series chart.

data tab

Result

此配置的最终结果如下所示。请注意,我们正在为此图形使用 Month to Date 的时间段。不同的时间段将产生不同的结果。

The final result of this configuration appears as follows. Please note that we are using a time period of Month to Date for this graph. Different time periods will give different results.

result time series

Elasticsearch - Tag Clouds

标签云以视觉吸引力的形式表示文本,这些文本主要是关键词和元数据。它们以不同的角度排列,并以不同的颜色和字体大小表示。它有助于找出数据中最突出的术语。突出性可以通过一个或多个因素决定,例如术语的频率、标签的唯一性或基于附加到特定术语的某些权重等。以下是我们创建标签云的步骤:

A tag cloud represents text which are mostly keywords and metadata in a visually appealing form. They are aligned in different angles and represented in different colours and font sizes. It helps in finding out the most prominent terms in the data. The prominence can be decided by one or more factors like frequency of the term, uniquness of the tag or based on some weightage attached to specific terms etc. Below we see the steps to create a Tag Cloud.

Visualize

在 Kibana 主屏幕上,我们找到了名为可视化选项,它允许我们从存储在 Elasticsearch 中的索引创建可视化和聚合。我们选择添加一个新的可视化并选择标签云作为如下所示的选项:

In Kibana Home screen, we find the option name Visualize which allows us to create visualization and aggregations from the indices stored in Elasticsearch. We choose to add a new visualization and select Tag Cloud as the option shown below −

tag cloud visualize

Choose the Metrics

下一屏提示我们选择将用于创建标签云的指标。在此,我们将数量选择为聚合指标的类型。然后,我们将 productName 字段选择为用作标签的关键字。

The next screen prompts us for choosing the metrics which will be used in creating the Tag Cloud. Here we choose the count as the type of aggregation metric. Then we choose productname field as the keyword to be used as tags.

tag cloud metrics

在此显示的结果显示我们应用选择操作后的饼图。请注意标签中提到的颜色阴影及其值。

The result shown here shows the pie chart after we apply the selection. Please note the shades of the colour and their values mentioned in the label.

tag cloud result

Tag Cloud Options

移动到标签云下的 options 标签后,我们可以看到多种配置选项来更改标签云中的数据显示外观和排列。在以下示例中,标签云出现时标签分布在水平方向和垂直方向上。

On moving to the options tab under Tag Cloud we can see various configuration options to change the look as well as the arrangement of data display in the Tag Cloud. In the below example the Tag Cloud appears with tags spread across both horizontal and vertical directions.

tag cloud options

Elasticsearch - Heat Maps

热图是一种可视化类型,其中不同色调的颜色代表图表中的不同区域。值可能是连续变化的,因此颜色的细微差别会随着值而变化。它们对于表示连续变化的数据以及离散数据都非常有用。

Heat map is a type of visualization in which different shades of colour represent different areas in the graph. The values may be continuously varying and hence the colour r shades of a colour vary along with the values. They are very useful to represent both the continuously varying data as well as discrete data.

在本章中,我们将使用名为 sample_data_flights 的数据集来构建热图图表。在其中,我们考虑名为航班始发国和目的国的变量并进行计数。

In this chapter we will use the data set named sample_data_flights to build a heatmap chart. In it we consider the variables named origin country and destination country of flights and take a count.

在 Kibana 主屏幕上,我们找到了名为可视化选项,它允许我们从存储在 Elasticsearch 中的索引创建可视化和聚合。我们选择添加一个新的可视化并选择热图作为如下所示的选项:

In Kibana Home screen, we find the option name Visualize which allows us to create visualization and aggregations from the indices stored in Elasticsearch. We choose to add a new visualization and select Heat Map as the option shown below &mimus;

heat map visualize

Choose the Metrics

在下一个屏幕中,系统提示我们选择将在创建热图图表中使用的指标。在这里,我们选择计数作为聚合指标的类型。然后,对于 Y 轴中的桶,我们选择按 OriginCountry 字段进行聚合。对于 X 轴,我们选择相同的聚合,但 DestCountry 作为要使用的字段。在两种情况下,我们都将桶的大小选择为 5。

The next screen prompts us for choosing the metrics which will be used in creating the Heat Map Chart. Here we choose the count as the type of aggregation metric. Then for the buckets in Y-Axis, we choose Terms as the aggregation for the field OriginCountry. For the X-Axis, we choose the same aggregation but DestCountry as the field to be used. In both the cases, we choose the size of the bucket as 5.

heat map metrics

在运行上面显示的配置后,我们将生成以下热图图表。

On running the above shown configuration, we get the heat map chart generated as follows.

heat map configuration

Note - 你必须允许日期范围为今年,以便图表收集一年的数据来生成有效的热图图表。

Note − You have to allow the date range as This Year so that the graph gathers data for a year to produce an effective heat map chart.

Elasticsearch - Canvas

Canvas 应用程序是 Kibana 的一部分,它允许我们创建动态、多页面和像素完美的显示数据。它创建信息图而不仅仅是图表和指标的能力使其独一无二且有吸引力。在本章中,我们将看到 canvas 的各种特性以及如何使用 canvas 工作区。

Canvas application is a part of Kibana which allows us to create dynamic, multi-page and pixel perfect data displays. Its ability to create infographics and not just charts and metrices is what makes it unique and appealing. In this chapter we will see various features of canvas and how to use the canvas work pads.

Opening a Canvas

转到 Kibana 主页并选择如下面的图表所示的选项。它会打开你拥有的 canvas 工作区列表。我们选择电子商务收入跟踪进行我们的研究。

Go to the Kibana homepage and select the option as shown in the below diagram. It opens up the list of canvas work pads you have. We choose the ecommerce Revenue tracking for our study.

opening a Canvas

Cloning A Workpad

我们将 [eCommerce] Revenue Tracking 工作区克隆以用于我们的研究。要克隆它,我们突出显示包含此工作区名称的行,然后使用如下面图表所示的克隆按钮 -

We clone the [eCommerce] Revenue Tracking workpad to be used in our study. To clone it, we highlight the row with the name of this workpad and then use the clone button as shown in the diagram below −

cloning a workpad

克隆完成后,我们将得到一个名为 [eCommerce] Revenue Tracking – Copy 的新工作区,打开后将显示以下信息图。

As a result of the above clone, we will get a new work pad named as [eCommerce] Revenue Tracking – Copy which on opening will show the below infographics.

它通过精美的图片和图表描述了按类别划分的总销售额和收入。

It describes the total sales and Revenue by category along with nice pictures and charts.

total sales and revenue

Modifying the Workpad

我们可以使用右侧选项卡中提供的选项来更改工作簿中的样式和数字。这里我们希望通过选择不同的颜色来更改工作簿的背景颜色,如下图所示。颜色选择立即生效,我们得到的结果如下所示−

We can change the style and figures in the workpad by using the options available in the right hand side tab. Here we aim to change the background colour of the workpad by choosing a different colour as shown in the diagram below. The colour selection comes into effect immediately and we get the result as shown below −

modifying the workpad

Elasticsearch - Logs UI

Kibana 还可以帮助可视化来自不同来源的日志数据。日志是用于分析基础结构健康、性能需求和安全漏洞分析等的重要分析来源。Kibana 可以连接到各种日志,如 Web 服务器日志、Elasticsearch 日志和 Cloudwatch 日志等。

Kibana can also help in visualizing log data from various sources. Logs are important sources of analysis for infrastructure health, performance needs and security breach analysis etc. Kibana can connect to various logs like web server logs, elasticsearch logs and cloudwatch logs etc.

Logstash Logs

在 Kibana 中,我们可以连接到 Logstash 日志以进行可视化。首先,我们从 Kibana 主屏幕中选择“日志”按钮,如下所示−

In Kibana, we can connect to logstash logs for visualization. First we choose the Logs button from the Kibana home screen as shown below −

logstash logs

然后我们选择“更改源配置”选项,它为我们带来了选择 Logstash 作为源的选项。以下屏幕还显示了作为日志源的其他选项类型。

Then we choose the option Change Source Configuration which brings us the option to choose Logstash as a source. The below screen also shows other types of options we have as a log source.

change source configuration

您可以为实时日志跟踪传输数据,也可以暂停传输以专注于历史日志数据。当您正在传输日志时,最近的日志会出现在控制台底部。

You can stream data for live log tailing or pause streaming to focus on historical log data. When you are streaming logs, the most recent log appears at the bottom on the console.

如需进一步参考,您可以参阅我们的 Logstash 教程。

For further reference, you can refer to our Logstash tutorial.