Elasticsearch 简明教程

Elasticsearch - Basic Concepts

Elasticsearch 是一个基于 Apache Lucene 的搜索服务器。它由 Shay Banon 开发并于 2010 年发布。它现在由 Elasticsearch BV 维护。其最新版本是 7.0.0。

Elasticsearch is an Apache Lucene-based search server. It was developed by Shay Banon and published in 2010. It is now maintained by Elasticsearch BV. Its latest version is 7.0.0.

Elasticsearch 是一个实时分布式且开源的全文本搜索和分析引擎。它可由 RESTful Web 服务界面访问，并且使用无模式 JSON（JavaScript 对象标记法）文档来存储数据。它基于 Java 编程语言构建，因此 Elasticsearch 可以运行在不同的平台上。它使用户能够非常高速探索非常大量的数据。

Elasticsearch is a real-time distributed and open source full-text search and analytics engine. It is accessible from RESTful web service interface and uses schema less JSON (JavaScript Object Notation) documents to store data. It is built on Java programming language and hence Elasticsearch can run on different platforms. It enables users to explore very large amount of data at very high speed.

General Features

Elasticsearch 的一般特性如下 −

The general features of Elasticsearch are as follows −

Elasticsearch is scalable up to petabytes of structured and unstructured data.
Elasticsearch can be used as a replacement of document stores like MongoDB and RavenDB.
Elasticsearch uses denormalization to improve the search performance.
Elasticsearch is one of the popular enterprise search engines, and is currently being used by many big organizations like Wikipedia, The Guardian, StackOverflow, GitHub etc.
Elasticsearch is an open source and available under the Apache license version 2.0.

Key Concepts

Elasticsearch 的主要概念如下 −

The key concepts of Elasticsearch are as follows −

Node

这是指 Elasticsearch 的单个运行实例。单个物理和虚拟服务器可容纳多个节点，具体取决于其物理资源（如 RAM、存储和处理能力）的功能。

It refers to a single running instance of Elasticsearch. Single physical and virtual server accommodates multiple nodes depending upon the capabilities of their physical resources like RAM, storage and processing power.

Cluster

这是由一个或多个节点组成的。群集为所有节点提供跨整个数据的集体索引和搜索功能。

It is a collection of one or more nodes. Cluster provides collective indexing and search capabilities across all the nodes for entire data.

Index

这是不同类型文档及其属性的集合。索引还使用分片概念来提高性能。例如，一组文档包含社交网络应用程序的数据。

It is a collection of different type of documents and their properties. Index also uses the concept of shards to improve the performance. For example, a set of document contains data of a social networking application.

Document

这是按照 JSON 格式定义的特定方式中的字段集合。每个文档都属于一个类型并且驻留在索引中。每个文档都与一个称为 UID 的唯一标识符关联。

It is a collection of fields in a specific manner defined in JSON format. Every document belongs to a type and resides inside an index. Every document is associated with a unique identifier called the UID.

Shard

索引被水平细分为分片。这意味着每个分片包含文档的所有属性，但包含的 JSON 对象数量少于索引。水平分离使分片成为一个独立的节点，该节点可以存储在任何节点中。主分片是索引的原始水平部分，然后这些主分片被复制到副本分片中。

Indexes are horizontally subdivided into shards. This means each shard contains all the properties of document but contains less number of JSON objects than index. The horizontal separation makes shard an independent node, which can be store in any node. Primary shard is the original horizontal part of an index and then these primary shards are replicated into replica shards.

Replicas

Elasticsearch 允许用户创建其索引和分片的副本。复制不仅有助于在发生故障时增加数据的可用性，还通过在这些副本中执行并行搜索操作来提高搜索性能。

Elasticsearch allows a user to create replicas of their indexes and shards. Replication not only helps in increasing the availability of data in case of failure, but also improves the performance of searching by carrying out a parallel search operation in these replicas.

Advantages

Elasticsearch is developed on Java, which makes it compatible on almost every platform.
Elasticsearch is real time, in other words after one second the added document is searchable in this engine
Elasticsearch is distributed, which makes it easy to scale and integrate in any big organization.
Creating full backups are easy by using the concept of gateway, which is present in Elasticsearch.
Handling multi-tenancy is very easy in Elasticsearch when compared to Apache Solr.
Elasticsearch uses JSON objects as responses, which makes it possible to invoke the Elasticsearch server with a large number of different programming languages.
Elasticsearch supports almost every document type except those that do not support text rendering.

Disadvantages

Elasticsearch does not have multi-language support in terms of handling request and response data (only possible in JSON) unlike in Apache Solr, where it is possible in CSV, XML and JSON formats.
Occasionally, Elasticsearch has a problem of Split brain situations.

Comparison between Elasticsearch and RDBMS

在 Elasticsearch 中，索引类似于 RDBMS（关系数据库管理系统）中的表。每个表都是一组行，正如每个索引都是 Elasticsearch 中一组文档。

In Elasticsearch, index is similar to tables in RDBMS (Relation Database Management System). Every table is a collection of rows just as every index is a collection of documents in Elasticsearch.

下表给出了这些术语之间的直接比较：

The following table gives a direct comparison between these terms−

Elasticsearch

RDBMS

Cluster

Database

Shard

Index

Table

Field

Column

Document

Row