Elasticsearch 简明教程

Elasticsearch - Modules

Elasticsearch 由多个模块组成,这些模块负责其功能。这些模块有以下两种类型的设置:

Elasticsearch is composed of a number of modules, which are responsible for its functionality. These modules have two types of settings as follows −

  1. Static Settings − These settings need to be configured in config (elasticsearch.yml) file before starting Elasticsearch. You need to update all the concern nodes in the cluster to reflect the changes by these settings.

  2. Dynamic Settings − These settings can be set on live Elasticsearch.

我们将在本章的以下部分中讨论 Elasticsearch 的不同模块。

We will discuss the different modules of Elasticsearch in the following sections of this chapter.

Cluster-Level Routing and Shard Allocation

集群级别设置决定将分片分配给不同的节点以及重新分配分片来重新平衡集群。以下这些设置控制分片分配。

Cluster level settings decide the allocation of shards to different nodes and reallocation of shards to rebalance cluster. These are the following settings to control shard allocation.

Cluster-Level Shard Allocation

Setting

Possible value

Description

cluster.routing.allocation.enable

all

This default value allows shard allocation for all kinds of shards.

primaries

This allows shard allocation only for primary shards.

new_primaries

This allows shard allocation only for primary shards for new indices.

none

This does not allow any shard allocations.

cluster.routing.allocation .node_concurrent_recoveries

Numeric value (by default 2)

This restricts the number of concurrent shard recovery.

cluster.routing.allocation .node_initial_primaries_recoveries

Numeric value (by default 4)

This restricts the number of parallel initial primary recoveries.

cluster.routing.allocation .same_shard.host

Boolean value (by default false)

This restricts the allocation of more than one replica of the same shard in the same physical node.

indices.recovery.concurrent _streams

Numeric value (by default 3)

This controls the number of open network streams per node at the time of shard recovery from peer shards.

indices.recovery.concurrent _small_file_streams

Numeric value (by default 2)

This controls the number of open streams per node for small files having size less than 5mb at the time of shard recovery.

cluster.routing.rebalance.enable

all

This default value allows balancing for all kinds of shards.

primaries

This allows shard balancing only for primary shards.

replicas

This allows shard balancing only for replica shards.

none

This does not allow any kind of shard balancing.

cluster.routing.allocation .allow_rebalance

always

This default value always allows rebalancing.

indices_primaries _active

This allows rebalancing when all primary shards in cluster are allocated.

Indices_all_active

This allows rebalancing when all the primary and replica shards are allocated.

cluster.routing.allocation.cluster _concurrent_rebalance

Numeric value (by default 2)

This restricts the number of concurrent shard balancing in cluster.

cluster.routing.allocation .balance.shard

Float value (by default 0.45f)

This defines the weight factor for shards allocated on every node.

cluster.routing.allocation .balance.index

Float value (by default 0.55f)

This defines the ratio of the number of shards per index allocated on a specific node.

cluster.routing.allocation .balance.threshold

Non negative float value (by default 1.0f)

Disk-based Shard Allocation

Setting

Possible value

Description

cluster.routing.allocation.disk.threshold_enabled

Boolean value (by default true)

This enables and disables disk allocation decider.

cluster.routing.allocation.disk.watermark.low

String value(by default 85%)

This denotes maximum usage of disk; after this point, no other shard can be allocated to that disk.

cluster.routing.allocation.disk.watermark.high

String value (by default 90%)

This denotes the maximum usage at the time of allocation; if this point is reached at the time of allocation, then Elasticsearch will allocate that shard to another disk.

cluster.info.update.interval

String value (by default 30s)

This is the interval between disk usages checkups.

cluster.routing.allocation.disk.include_relocations

Boolean value (by default true)

This decides whether to consider the shards currently being allocated, while calculating disk usage.

Discovery

该模块帮助群集发现和维护其中所有节点的状态。当向群集中添加或从群集中删除某个节点时,群集状态将会发生更改。群集名称设置用于在不同群集之间创建逻辑差异。以下是一些帮助你使用云供应商提供的 API 的模块:

This module helps a cluster to discover and maintain the state of all the nodes in it. The state of cluster changes when a node is added or deleted from it. The cluster name setting is used to create logical difference between different clusters. There are some modules which help you to use the APIs provided by cloud vendors and those are as given below −

  1. Azure discovery

  2. EC2 discovery

  3. Google compute engine discovery

  4. Zen discovery

Gateway

该模块在整个群集重启过程中维护群集状态和切片数据。该模块的静态设置如下:

This module maintains the cluster state and the shard data across full cluster restarts. The following are the static settings of this module −

Setting

Possible value

Description

gateway.expected_nodes

numeric value (by default 0)

The number of nodes that are expected to be in the cluster for the recovery of local shards.

gateway.expected_master_nodes

numeric value (by default 0)

The number of master nodes that are expected to be in the cluster before start recovery.

gateway.expected_data_nodes

numeric value (by default 0)

The number of data nodes expected in the cluster before start recovery.

gateway.recover_after_time

String value (by default 5m)

This is the interval between disk usages checkups.

cluster.routing.allocation. disk.include_relocations

Boolean value (by default true)

This specifies the time for which the recovery process will wait to start regardless of the number of nodes joined in the cluster. gateway.recover_ after_nodes gateway.recover_after_master_nodes gateway.recover_after_data_nodes

HTTP

此模块管理 HTTP 客户端和 Elasticsearch API 之间的通信。该模块可通过将 http.enabled 的值更改为 false 来禁用。

This module manages the communication between HTTP client and Elasticsearch APIs. This module can be disabled by changing the value of http.enabled to false.

以下为控制此模块的设置(在 elasticsearch.yml 中配置):

The following are the settings (configured in elasticsearch.yml) to control this module −

S.No

Setting & Description

1

http.port This is a port to access Elasticsearch and it ranges from 9200-9300.

2

http.publish_port This port is for http clients and is also useful in case of firewall.

3

http.bind_host This is a host address for http service.

4

http.publish_host This is a host address for http client.

5

http.max_content_length This is the maximum size of content in an http request. Its default value is 100mb.

6

http.max_initial_line_length This is the maximum size of URL and its default value is 4kb.

7

http.max_header_size This is the maximum http header size and its default value is 8kb.

8

http.compression This enables or disables support for compression and its default value is false.

9

http.pipelinig This enables or disables HTTP pipelining.

10

http.pipelining.max_events This restricts the number of events to be queued before closing an HTTP request.

Indices

该模块维护设置,这些设置全局针对每个索引进行设置。以下设置主要与内存使用相关:

This module maintains the settings, which are set globally for every index. The following settings are mainly related to memory usage −

Circuit Breaker

此项用于防止操作导致 OutOfMemroyError。此项设置主要限制 JVM 堆大小。例如,indices.breaker.total.limit 设置的默认值是 JVM 堆的 70%。

This is used for preventing operation from causing an OutOfMemroyError. The setting mainly restricts the JVM heap size. For example, indices.breaker.total.limit setting, which defaults to 70% of JVM heap.

Fielddata Cache

这主要是用于在一个字段中聚合时使用的。建议有足够的内存分配给它。使用索引中用于字段数据缓存的内存量可以得到 indices.fielddata.cache.size 设置的控制。

This is used mainly when aggregating on a field. It is recommended to have enough memory to allocate it. The amount of memory used for the field data cache can be controlled using indices.fielddata.cache.size setting.

Node Query Cache

该内存用于缓存查询结果。该缓存使用最近最少使用 (LRU) 驱逐策略。Indices.queries.cahce.size 设置控制该缓存的内存大小。

This memory is used for caching the query results. This cache uses Least Recently Used (LRU) eviction policy. Indices.queries.cahce.size setting controls the memory size of this cache.

Indexing Buffer

该缓冲区存储索引中新创建的文档,并在缓冲区已满时刷新它们。诸如 indices.memory.index_buffer_size 的设置控制为该缓冲区分配的堆大小。

This buffer stores the newly created documents in the index and flushes them when the buffer is full. Setting like indices.memory.index_buffer_size control the amount of heap allocated for this buffer.

Shard Request Cache

该缓存用于存储每个分片的本地搜索数据。可以在创建索引期间启用缓存或通过发送 URL 参数禁用缓存。

This cache is used to store the local search data for every shard. Cache can be enabled during the creation of index or can be disabled by sending URL parameter.

Disable cache - ?request_cache = true
Enable cache "index.requests.cache.enable": true

Indices Recovery

它在恢复过程中控制资源。以下是设置−

It controls the resources during recovery process. The following are the settings −

Setting

Default value

indices.recovery.concurrent_streams

3

indices.recovery.concurrent_small_file_streams

2

indices.recovery.file_chunk_size

512kb

indices.recovery.translog_ops

1000

indices.recovery.translog_size

512kb

indices.recovery.compress

true

indices.recovery.max_bytes_per_sec

40mb

TTL Interval

生存期 (TTL) 间隔定义了一个文档的时间,之后将删除该文档。以下动态设置用于控制这一过程−

Time to Live (TTL) interval defines the time of a document, after which the document gets deleted. The following are the dynamic settings for controlling this process −

Setting

Default value

indices.ttl.interval

60s

indices.ttl.bulk_size

1000

Node

每个节点都可以选择为数据节点,也可以不为数据节点。您可以通过更改 node.data 设置来更改此属性。将该值设定为 false 来确定节点不是数据节点。

Each node has an option to be data node or not. You can change this property by changing node.data setting. Setting the value as false defines that the node is not a data node.