Miscellaneous Elasticsearch Operation Support

  • 索引设置定义(通过 @Setting 注释)

  • 索引映射定义(通过 @Mapping 注释)

  • 过滤器构建器

  • 用于获取大型结果集的滚动 API

  • 提供自定义排序选项

  • 运行时字段(从 Elasticsearch 7.12 开始)

  • 时间点 API

  • 搜索模板支持

  • 嵌套排序

本章包含对无法通过存储库接口直接访问的 Elasticsearch 操作的额外支持。建议按照 repositories/custom-implementations.adoc 中所述添加这些操作作为自定义实现。

This chapter covers additional support for Elasticsearch operations that cannot be directly accessed via the repository interface. It is recommended to add those operations as custom implementation as described in repositories/custom-implementations.adoc .

Index settings

在使用 Spring Data Elasticsearch 创建 Elasticsearch 索引时,可以使用 @Setting 注解来定义不同的索引设置。以下参数可用:

When creating Elasticsearch indices with Spring Data Elasticsearch different index settings can be defined by using the @Setting annotation. The following arguments are available:

  • useServerConfiguration does not send any settings parameters, so the Elasticsearch server configuration determines them.

  • settingPath refers to a JSON file defining the settings that must be resolvable in the classpath

  • shards the number of shards to use, defaults to 1

  • replicas the number of replicas, defaults to 1

  • refreshIntervall, defaults to "1s"

  • indexStoreType, defaults to "fs"

也可以定义 index sorting(查看链接的 Elasticsearch 文档以了解可能的字段类型和值):

It is as well possible to define index sorting (check the linked Elasticsearch documentation for the possible field types and values):

@Document(indexName = "entities")
@Setting(
  sortFields = { "secondField", "firstField" },                                  1
  sortModes = { Setting.SortMode.max, Setting.SortMode.min },                    2
  sortOrders = { Setting.SortOrder.desc, Setting.SortOrder.asc },
  sortMissingValues = { Setting.SortMissing._last, Setting.SortMissing._first })
class Entity {
    @Nullable
    @Id private String id;

    @Nullable
    @Field(name = "first_field", type = FieldType.Keyword)
    private String firstField;

    @Nullable @Field(name = "second_field", type = FieldType.Keyword)
    private String secondField;

    // getter and setter...
}
1 when defining sort fields, use the name of the Java property (firstField), not the name that might be defined for Elasticsearch (first_field)
2 sortModes, sortOrders and sortMissingValues are optional, but if they are set, the number of entries must match the number of sortFields elements

Index Mapping

当 Spring Data Elasticsearch 使用 `IndexOperations.createMapping()`方法创建索引映射时,它使用 Mapping Annotation Overview中描述的注解,尤其是 `@Field`注解。除此之外,还可以向类添加 `@Mapping`注解。此注解具有以下属性:

When Spring Data Elasticsearch creates the index mapping with the IndexOperations.createMapping() methods, it uses the annotations described in Mapping Annotation Overview, especially the @Field annotation. In addition to that it is possible to add the @Mapping annotation to a class. This annotation has the following properties:

  • mappingPath a classpath resource in JSON format; if this is not empty it is used as the mapping, no other mapping processing is done.

  • enabled when set to false, this flag is written to the mapping and no further processing is done.

  • dateDetection and numericDetection set the corresponding properties in the mapping when not set to DEFAULT.

  • dynamicDateFormats when this String array is not empty, it defines the date formats used for automatic date detection.

  • runtimeFieldsPath a classpath resource in JSON format containing the definition of runtime fields which is written to the index mappings, for example:

{
  "day_of_week": {
    "type": "keyword",
    "script": {
      "source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
    }
  }
}

Filter Builder

过滤器构建器提高了查询速度。

Filter Builder improves query speed.

private ElasticsearchOperations operations;

IndexCoordinates index = IndexCoordinates.of("sample-index");

Query query = NativeQuery.builder()
	.withQuery(q -> q
		.matchAll(ma -> ma))
	.withFilter( q -> q
		.bool(b -> b
			.must(m -> m
				.term(t -> t
					.field("id")
					.value(documentId))
			)))
	.build();

SearchHits<SampleEntity> sampleEntities = operations.search(query, SampleEntity.class, index);

Using Scroll For Big Result Set

Elasticsearch 具有一个滚动 API,用于块状获取大的结果集。Spring Data Elasticsearch 内部使用它来提供 <T> SearchHitsIterator<T> SearchOperations.searchForStream(Query query, Class<T> clazz, IndexCoordinates index) 方法的实现。

Elasticsearch has a scroll API for getting big result set in chunks. This is internally used by Spring Data Elasticsearch to provide the implementations of the <T> SearchHitsIterator<T> SearchOperations.searchForStream(Query query, Class<T> clazz, IndexCoordinates index) method.

IndexCoordinates index = IndexCoordinates.of("sample-index");

Query searchQuery = NativeQuery.builder()
    .withQuery(q -> q
        .matchAll(ma -> ma))
    .withFields("message")
    .withPageable(PageRequest.of(0, 10))
    .build();

SearchHitsIterator<SampleEntity> stream = elasticsearchOperations.searchForStream(searchQuery, SampleEntity.class,
index);

List<SampleEntity> sampleEntities = new ArrayList<>();
while (stream.hasNext()) {
  sampleEntities.add(stream.next());
}

stream.close();

如果必须访问滚动 ID,则 SearchOperations API 中没有可以访问此 ID 的方法,但我可以使用 AbstractElasticsearchTemplate 的以下方法(这是不同 ElasticsearchOperations 实现的基本实现):

There are no methods in the SearchOperations API to access the scroll id, if it should be necessary to access this, the following methods of the AbstractElasticsearchTemplate can be used (this is the base implementation for the different ElasticsearchOperations implementations):

@Autowired ElasticsearchOperations operations;

AbstractElasticsearchTemplate template = (AbstractElasticsearchTemplate)operations;

IndexCoordinates index = IndexCoordinates.of("sample-index");

Query query = NativeQuery.builder()
    .withQuery(q -> q
        .matchAll(ma -> ma))
    .withFields("message")
    .withPageable(PageRequest.of(0, 10))
    .build();

SearchScrollHits<SampleEntity> scroll = template.searchScrollStart(1000, query, SampleEntity.class, index);

String scrollId = scroll.getScrollId();
List<SampleEntity> sampleEntities = new ArrayList<>();
while (scroll.hasSearchHits()) {
  sampleEntities.addAll(scroll.getSearchHits());
  scrollId = scroll.getScrollId();
  scroll = template.searchScrollContinue(scrollId, 1000, SampleEntity.class);
}
template.searchScrollClear(scrollId);

要将滚动 API 与存储库方法一起使用,必须在 Elasticsearch 存储库中将返回类型定义为 Stream。然后,该方法的实现将使用 ElasticsearchTemplate 的滚动方法。

To use the Scroll API with repository methods, the return type must defined as Stream in the Elasticsearch Repository. The implementation of the method will then use the scroll methods from the ElasticsearchTemplate.

interface SampleEntityRepository extends Repository<SampleEntity, String> {

    Stream<SampleEntity> findBy();

}

Sort options

除了 Paging and Sorting中描述的默认排序选项之外,Spring Data Elasticsearch 还有类 org.springframework.data.elasticsearch.core.query.Order,它派生自 org.springframework.data.domain.Sort.Order。它提供了在指定结果排序时可以发送到 Elasticsearch 的其他参数(请参见 [role="bare"][role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/7.15/sort-search-results.html)。

In addition to the default sort options described in Paging and Sorting, Spring Data Elasticsearch provides the class org.springframework.data.elasticsearch.core.query.Order which derives from org.springframework.data.domain.Sort.Order. It offers additional parameters that can be sent to Elasticsearch when specifying the sorting of the result (see [role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/7.15/sort-search-results.html).

还有 org.springframework.data.elasticsearch.core.query.GeoDistanceOrder 类,可用于将搜索操作的结果按地理距离排序。

There also is the org.springframework.data.elasticsearch.core.query.GeoDistanceOrder class which can be used to have the result of a search operation ordered by geographical distance.

如果要检索的类具有名为 locationGeoPoint 属性,则以下 Sort 将按与给定点的距离对结果进行排序:

If the class to be retrieved has a GeoPoint property named location, the following Sort would sort the results by distance to the given point:

Sort.by(new GeoDistanceOrder("location", new GeoPoint(48.137154, 11.5761247)))

Runtime Fields

从 7.12 版开始,Elasticsearch 增加了运行时字段 ([role="bare"][role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/7.12/runtime.html) 功能。Spring Data Elasticsearch 以两种方式支持此功能:

From version 7.12 on Elasticsearch has added the feature of runtime fields ([role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/7.12/runtime.html). Spring Data Elasticsearch supports this in two ways:

Runtime field definitions in the index mappings

定义运行时字段的第一种方法是将定义添加到索引映射(请参见 [role="bare"][role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/7.12/runtime-mapping-fields.html)。要在 Spring Data Elasticsearch 中使用这种方法,用户必须提供包含相应定义的 JSON 文件,例如:

The first way to define runtime fields is by adding the definitions to the index mappings (see [role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/7.12/runtime-mapping-fields.html). To use this approach in Spring Data Elasticsearch the user must provide a JSON file that contains the corresponding definition, for example:

Example 1. runtime-fields.json
{
  "day_of_week": {
    "type": "keyword",
    "script": {
      "source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
    }
  }
}

然后,必须在实体的 @Mapping 注释中设置此 JSON 文件的路径,该 JSON 文件必须存在于类路径中:

The path to this JSON file, which must be present on the classpath, must then be set in the @Mapping annotation of the entity:

@Document(indexName = "runtime-fields")
@Mapping(runtimeFieldsPath = "/runtime-fields.json")
public class RuntimeFieldEntity {
	// properties, getter, setter,...
}

Runtime fields definitions set on a Query

定义运行时字段的第二种方法是将定义添加到搜索查询(请参见 [role="bare"][role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/7.12/runtime-search-request.html)。以下代码示例展示了如何利用 Spring Data Elasticsearch 执行此操作:

The second way to define runtime fields is by adding the definitions to a search query (see [role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/7.12/runtime-search-request.html). The following code example shows how to do this with Spring Data Elasticsearch :

使用的实体是一个具有 price 属性的简单对象:

The entity used is a simple object that has a price property:

@Document(indexName = "some_index_name")
public class SomethingToBuy {

	private @Id @Nullable String id;
	@Nullable @Field(type = FieldType.Text) private String description;
	@Nullable @Field(type = FieldType.Double) private Double price;

	// getter and setter
}

以下查询使用运行时字段,通过向价格添加 19% 从而计算 priceWithTax 值,并在搜索查询中使用此值来查找 priceWithTax 高于或等于给定值的所有实体:

The following query uses a runtime field that calculates a priceWithTax value by adding 19% to the price and uses this value in the search query to find all entities where priceWithTax is higher or equal than a given value:

RuntimeField runtimeField = new RuntimeField("priceWithTax", "double", "emit(doc['price'].value * 1.19)");
Query query = new CriteriaQuery(new Criteria("priceWithTax").greaterThanEqual(16.5));
query.addRuntimeField(runtimeField);

SearchHits<SomethingToBuy> searchHits = operations.search(query, SomethingToBuy.class);

适用于 Query 接口的所有实现。

This works with every implementation of the Query interface.

Point In Time (PIT) API

ElasticsearchOperations 支持 Elasticsearch 的时间点 API(请参见 [role="bare"][role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/8.3/point-in-time-api.html)。以下代码片段展示了如何将此功能与一个虚构的 Person 类结合使用:

ElasticsearchOperations supports the point in time API of Elasticsearch (see [role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/8.3/point-in-time-api.html). The following code snippet shows how to use this feature with a fictional Person class:

ElasticsearchOperations operations; // autowired
Duration tenSeconds = Duration.ofSeconds(10);

String pit = operations.openPointInTime(IndexCoordinates.of("person"), tenSeconds); 1

// create query for the pit
Query query1 = new CriteriaQueryBuilder(Criteria.where("lastName").is("Smith"))
    .withPointInTime(new Query.PointInTime(pit, tenSeconds))                        2
    .build();
SearchHits<Person> searchHits1 = operations.search(query1, Person.class);
// do something with the data

// create 2nd query for the pit, use the id returned in the previous result
Query query2 = new CriteriaQueryBuilder(Criteria.where("lastName").is("Miller"))
    .withPointInTime(
        new Query.PointInTime(searchHits1.getPointInTimeId(), tenSeconds))          3
    .build();
SearchHits<Person> searchHits2 = operations.search(query2, Person.class);
// do something with the data

operations.closePointInTime(searchHits2.getPointInTimeId());                        4
1 create a point in time for an index (can be multiple names) and a keep-alive duration and retrieve its id
2 pass that id into the query to search together with the next keep-alive value
3 for the next query, use the id returned from the previous search
4 when done, close the point in time using the last returned id

Search Template support

支持使用搜索模板 API。若要使用此 API,首先需要创建一个存储的脚本。ElasticsearchOperations 接口扩展了 ScriptOperations,后者提供了必要的功能。这里使用的示例假设我们有一个 Person 实体,其属性名为 firstName。搜索模板脚本可以这样保存:

Use of the search template API is supported. To use this, it first is necessary to create a stored script. The ElasticsearchOperations interface extends ScriptOperations which provides the necessary functions. The example used here assumes that we have Person entity with a property named firstName. A search template script can be saved like this:

import org.springframework.data.elasticsearch.core.ElasticsearchOperations;
import org.springframework.data.elasticsearch.core.script.Script;

operations.putScript(                            1
  Script.builder()
    .withId("person-firstname")                  2
    .withLanguage("mustache")                    3
    .withSource("""                              4
      {
        "query": {
          "bool": {
            "must": [
              {
                "match": {
                  "firstName": "{{firstName}}"   5
                }
              }
            ]
          }
        },
        "from": "{{from}}",                      6
        "size": "{{size}}"                       7
      }
      """)
    .build()
);
1 Use the putScript() method to store a search template script
2 The name / id of the script
3 Scripts that are used in search templates must be in the mustache language.
4 The script source
5 The search parameter in the script
6 Paging request offset
7 Paging request size

为了在搜索查询中使用搜索模板,Spring Data Elasticsearch 提供了 SearchTemplateQuery,它是 org.springframework.data.elasticsearch.core.query.Query 接口的一个实现。

To use a search template in a search query, Spring Data Elasticsearch provides the SearchTemplateQuery, an implementation of the org.springframework.data.elasticsearch.core.query.Query interface.

在以下代码中,我们将添加使用搜索模板查询调用自定义存储库实现的示例(参见 repositories/custom-implementations.adoc),说明如何将其集成到存储库调用中。

In the following code, we will add a call using a search template query to a custom repository implementation (see repositories/custom-implementations.adoc) as an example how this can be integrated into a repository call.

首先,我们定义自定义仓库片段接口:

We first define the custom repository fragment interface:

interface PersonCustomRepository {
	SearchPage<Person> findByFirstNameWithSearchTemplate(String firstName, Pageable pageable);
}

此仓库片段的实现如下:

The implementation of this repository fragment looks like this:

public class PersonCustomRepositoryImpl implements PersonCustomRepository {

  private final ElasticsearchOperations operations;

  public PersonCustomRepositoryImpl(ElasticsearchOperations operations) {
    this.operations = operations;
  }

  @Override
  public SearchPage<Person> findByFirstNameWithSearchTemplate(String firstName, Pageable pageable) {

    var query = SearchTemplateQuery.builder()                               1
      .withId("person-firstname")                                           2
      .withParams(
        Map.of(                                                             3
          "firstName", firstName,
          "from", pageable.getOffset(),
          "size", pageable.getPageSize()
          )
      )
      .build();

    SearchHits<Person> searchHits = operations.search(query, Person.class); 4

    return SearchHitSupport.searchPageFor(searchHits, pageable);
  }
}
1 Create a SearchTemplateQuery
2 Provide the id of the search template
3 The parameters are passed in a Map<String,Object>
4 Do the search in the same way as with the other query types.

Nested sort

Spring Data Elasticsearch 支持在嵌套对象中进行排序([role="bare"][role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/8.9/sort-search-results.html#nested-sorting)

Spring Data Elasticsearch supports sorting within nested objects ([role="bare"]https://www.elastic.co/guide/en/elasticsearch/reference/8.9/sort-search-results.html#nested-sorting)

以下示例取自 org.springframework.data.elasticsearch.core.query.sort.NestedSortIntegrationTests 类,展示了如何定义嵌套排序。

The following example, taken from the org.springframework.data.elasticsearch.core.query.sort.NestedSortIntegrationTests class, shows how to define the nested sort.

var filter = StringQuery.builder("""
	{ "term": {"movies.actors.sex": "m"} }
	""").build();
var order = new org.springframework.data.elasticsearch.core.query.Order(Sort.Direction.DESC,
	"movies.actors.yearOfBirth")
	.withNested(
		Nested.builder("movies")
			.withNested(
				Nested.builder("movies.actors")
					.withFilter(filter)
					.build())
			.build());

var query = Query.findAll().addSort(Sort.by(order));

有关过滤器查询:无法在此处使用 CriteriaQuery,因为此查询将转换为 Elasticsearch 嵌套查询,而该查询在过滤器上下文中不起作用。因此,此处只能使用 StringQueryNativeQuery。使用其中一个时,如上文的术语查询,必须使用 Elasticsearch 字段名称,因此请小心,使用 @Field(name="…​") 定义重新定义这些名称时。

About the filter query: It is not possible to use a CriteriaQuery here, as this query would be converted into a Elasticsearch nested query which does not work in the filter context. So only StringQuery or NativeQuery can be used here. When using one of these, like the term query above, the Elasticsearch field names must be used, so take care, when these are redefined with the @Field(name="…​") definition.

必须使用 Java 实体属性名称作为顺序路径和嵌套路径的定义。

For the definition of the order path and the nested paths, the Java entity property names should be used.