Documentdb 简明教程

DocumentDB - Indexing Records

默认情况下,DocumentDB 会自动索引文档中的每个属性,只要该文档添加到数据库中。但是,您可以进行控制并微调自己的索引策略,这会在不必要索引特定文档和/或属性时减少存储和处理开销。

By default, DocumentDB automatically indexes every property in a document as soon as the document is added to the database. However, you can take control and fine tune your own indexing policy that reduces storage and processing overhead when there are specific documents and/or properties that never needs to be indexed.

默认索引策略会告诉 DocumentDB 自动索引每个属性,这适用于许多常见情况。但是,您还可以实现一项定制的策略,它对将索引用哪个以及不使用哪个执行精细控制,并对索引使用其他功能。

The default indexing policy that tells DocumentDB to index every property automatically is suitable for many common scenarios. But you can also implement a custom policy that exercises fine control over exactly what gets indexed and what doesn’t and other functionality with regards to indexing.

DocumentDB 支持以下索引类型:

DocumentDB supports the following types of indexing −

  1. Hash

  2. Range

Hash

哈希索引支持对相等进行有效查询,换句话说,在搜索文档时,给定属性等于一个精确值,而不是匹配小于、大于或介于一定值范围之内的值。

Hash index enables efficient querying for equality, i.e., while searching for documents where a given property equals an exact value, rather than matching on a range of values like less than, greater than or between.

您可以使用哈希索引执行范围查询,但 DocumentDB 无法使用哈希索引来查找匹配的文档,而需要顺序扫描每个文档以确定它是否应由范围查询选择。

You can perform range queries with a hash index, but DocumentDB will not be able to use the hash index to find matching documents and will instead need to sequentially scan each document to determine if it should be selected by the range query.

您无法使用仅具有哈希索引的属性上的 ORDER BY 子句对文档进行排序。

You won’t be able to sort your documents with an ORDER BY clause on a property that has just a hash index.

Range

DocumentDB 为属性定义了范围索引,您可以有效查询一系列值的文档。它还允许您使用 ORDER BY 根据该属性对查询结果进行排序。

Range index defined for the property, DocumentDB allows to efficiently query for documents against a range of values. It also allows you to sort the query results on that property, using ORDER BY.

DocumentDB 允许您为任何或所有属性定义哈希和范围索引,这支持相等和范围查询以及 ORDER BY。

DocumentDB allows you to define both a hash and a range index on any or all properties, which enables efficient equality and range queries, as well as ORDER BY.

Indexing Policy

每个集合都有一个索引策略,决定了在每个文档的每个属性中数字和字符串使用哪类索引。

Every collection has an indexing policy that dictates which types of indexes are used for numbers and strings in every property of every document.

  1. You can also control whether or not documents get indexed automatically as they are added to the collection.

  2. Automatic indexing is enabled by default, but you can override that behavior when adding a document, telling DocumentDB not to index that particular document.

  3. You can disable automatic indexing so that by default, documents are not indexed when added to the collection. Similarly, you can override this at the document level and instruct DocumentDB to index a particular document when adding it to the collection. This is known as manual indexing.

Include / Exclude Indexing

索引策略还可以定义路径或应包含在索引中或排除在索引之外的路径。如果您知道某个文档的某些部分永远不会作为查询条件,而某些部分却会作为查询条件,那么这很有用。

An indexing policy can also define which path or paths should be included or excluded from the index. This is useful if you know that there are certain parts of a document that you never query against and certain parts that you do.

在这些情况下,您可以通过指示 DocumentDB 仅为添加到集合的每个文档的那些特定部分建立索引来减少索引开销。

In these cases, you can reduce indexing overhead by telling DocumentDB to index just those particular portions of each document added to the collection.

Automatic Indexing

让我们来看一个自动索引的简单示例。

Let’s take a look at a simple example of automatic indexing.

Step 1 − 首先,我们创建一个名为 autoindexing 的集合,而无需明确提供策略,此集合使用默认索引策略,这意味着启用此集合上的自动索引。

Step 1 − First we create a collection called autoindexing and without explicitly supplying a policy, this collection uses the default indexing policy, which means that automatic indexing is enabled on this collection.

这里我们使用基于 ID 的路由来获得数据库自引用链接,因此我们无需在创建集合之前知道其资源 ID 或查询它。我们可以仅仅使用数据库 ID,即 mydb。

Here we are using ID-based routing for the database self-link so we don’t need to know its resource ID or query for it before creating the collection. We can just use the database ID, which is mydb.

Step 2 − 现在,让我们创建两个文档,姓氏均为 Upston。

Step 2 − Now let’s create two documents, both with the last name of Upston.

private async static Task AutomaticIndexing(DocumentClient client) {
   Console.WriteLine();
   Console.WriteLine("**** Override Automatic Indexing ****");

   // Create collection with automatic indexing

   var collectionDefinition = new DocumentCollection {
      Id = "autoindexing"
   };

   var collection = await client.CreateDocumentCollectionAsync("dbs/mydb",
      collectionDefinition);

   // Add a document (indexed)
   dynamic indexedDocumentDefinition = new {
      id = "MARK",
      firstName = "Mark",
      lastName = "Upston",
      addressLine = "123 Main Street",
      city = "Brooklyn",
      state = "New York",
      zip = "11229",
   };

   Document indexedDocument = await client
      .CreateDocumentAsync("dbs/mydb/colls/autoindexing", indexedDocumentDefinition);

   // Add another document (request no indexing)
   dynamic unindexedDocumentDefinition = new {
      id = "JANE",
      firstName = "Jane",
      lastName = "Upston",
      addressLine = "123 Main Street",
      city = "Brooklyn",
      state = "New York",
      zip = "11229",
   };

   Document unindexedDocument = await client
      .CreateDocumentAsync("dbs/mydb/colls/autoindexing", unindexedDocumentDefinition,
      new RequestOptions { IndexingDirective = IndexingDirective.Exclude });

   //Unindexed document won't get returned when querying on non-ID (or selflink) property

   var doeDocs = client.CreateDocumentQuery("dbs/mydb/colls/autoindexing", "SELECT *
      FROM c WHERE c.lastName = 'Doe'").ToList();

   Console.WriteLine("Documents WHERE lastName = 'Doe': {0}", doeDocs.Count);

   // Unindexed document will get returned when using no WHERE clause

   var allDocs = client.CreateDocumentQuery("dbs/mydb/colls/autoindexing",
      "SELECT * FROM c").ToList();
   Console.WriteLine("All documents: {0}", allDocs.Count);

   // Unindexed document will get returned when querying by ID (or self-link) property

   Document janeDoc = client.CreateDocumentQuery("dbs/mydb/colls/autoindexing",
      "SELECT * FROM c WHERE c.id = 'JANE'").AsEnumerable().FirstOrDefault();
   Console.WriteLine("Unindexed document self-link: {0}", janeDoc.SelfLink);

   // Delete the collection

   await client.DeleteDocumentCollectionAsync("dbs/mydb/colls/autoindexing");
}

第一个文档属于 Mark Upston,已添加到集合中,然后立即根据默认索引策略自动对其建立索引。

This first one, for Mark Upston, gets added to the collection and is then immediately indexed automatically based on the default indexing policy.

但是,当添加第二个 Mark Upston 的文档时,我们已发送带有 IndexingDirective.Exclude 的请求选项,此选项明确指示 DocumentDB 不要为该文档建立索引,尽管有集合的索引策略。

But when the second document for Mark Upston is added, we have passed the request options with IndexingDirective.Exclude which explicitly instructs DocumentDB not to index this document, despite the collection’s indexing policy.

我们最终为两个文档设置了不同类型的查询。

We have different types of queries for both the documents at the end.

Step 3 − 让我们从 CreateDocumentClient 中调用 AutomaticIndexing 任务。

Step 3 − Let’s call the AutomaticIndexing task from CreateDocumentClient.

private static async Task CreateDocumentClient() {
   // Create a new instance of the DocumentClient
   using (var client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey)) {
      await AutomaticIndexing(client);
   }
}

当上文代码被编译和执行时,您将收到如下输出。

When the above code is compiled and executed, you will receive the following output.

**** Override Automatic Indexing ****
Documents WHERE lastName = 'Upston': 1
All documents: 2
Unindexed document self-link: dbs/kV5oAA==/colls/kV5oAOEkfQA=/docs/kV5oAOEkfQACA
AAAAAAAAA==/

正如您所见,我们有两个这样的文档,但查询仅返回马克的那个,因为马克的那个未编入索引。如果我们再次查询,不使用 WHERE 子句来检索集合中的所有文档,那么结果集将包含这两个文档,这是因为始终通过没有 WHERE 子句的查询返回未编制索引的文档。

As you can see we have two such documents, but the query returns only the one for Mark because the one for Mark isn’t indexed. If we query again, without a WHERE clause to retrieve all the documents in the collection, then we get a result set with both documents and this is because unindexed documents are always returned by queries that have no WHERE clause.

我们还可按其 ID 或自链接检索未编制索引的文档。因此,当我们按其 ID MARK 查询马克的文档时,我们看到 DocumentDB 返回该文档,尽管它未在集合中编制索引。

We can also retrieve unindexed documents by their ID or self-link. So when we query for Mark’s document by his ID, MARK, we see that DocumentDB returns the document even though it isn’t indexed in the collection.

Manual Indexing

我们来看一个简单的示例,通过覆盖自动索引来进行手动索引。

Let’ take a look at a simple example of manual indexing by overriding automatic indexing.

Step 1 − 首先,我们将创建一个名为 manualindexing 的集合,并通过明确禁用自动索引来覆盖默认策略。这意味着,除非我们提出其他请求,否则添加到此集合的新文档将不会编制索引。

Step 1 − First we’ll create a collection called manualindexing and override the default policy by explicitly disabling automatic indexing. This means that, unless we request otherwise, new documents added to this collection will not be indexed.

private async static Task ManualIndexing(DocumentClient client) {
   Console.WriteLine();
   Console.WriteLine("**** Manual Indexing ****");
   // Create collection with manual indexing

   var collectionDefinition = new DocumentCollection {
      Id = "manualindexing",
      IndexingPolicy = new IndexingPolicy {
         Automatic = false,
      },
   };

   var collection = await client.CreateDocumentCollectionAsync("dbs/mydb",
      collectionDefinition);

   // Add a document (unindexed)
   dynamic unindexedDocumentDefinition = new {
      id = "MARK",
      firstName = "Mark",
      lastName = "Doe",
      addressLine = "123 Main Street",
      city = "Brooklyn",
      state = "New York",
      zip = "11229",
   };

   Document unindexedDocument = await client
      .CreateDocumentAsync("dbs/mydb/colls/manualindexing", unindexedDocumentDefinition);

   // Add another document (request indexing)
   dynamic indexedDocumentDefinition = new {
      id = "JANE",
      firstName = "Jane",
      lastName = "Doe",
      addressLine = "123 Main Street",
      city = "Brooklyn",
      state = "New York",
      zip = "11229",
   };

   Document indexedDocument = await client.CreateDocumentAsync
      ("dbs/mydb/colls/manualindexing", indexedDocumentDefinition, new RequestOptions {
      IndexingDirective = IndexingDirective.Include });

   //Unindexed document won't get returned when querying on non-ID (or selflink) property

   var doeDocs = client.CreateDocumentQuery("dbs/mydb/colls/manualindexing",
      "SELECT * FROM c WHERE c.lastName = 'Doe'").ToList();
   Console.WriteLine("Documents WHERE lastName = 'Doe': {0}", doeDocs.Count);

   // Unindexed document will get returned when using no WHERE clause

   var allDocs = client.CreateDocumentQuery("dbs/mydb/colls/manualindexing",
      "SELECT * FROM c").ToList();
   Console.WriteLine("All documents: {0}", allDocs.Count);

   // Unindexed document will get returned when querying by ID (or self-link) property

   Document markDoc = client
      .CreateDocumentQuery("dbs/mydb/colls/manualindexing",
      "SELECT * FROM c WHERE c.id = 'MARK'")
      .AsEnumerable().FirstOrDefault();
   Console.WriteLine("Unindexed document self-link: {0}", markDoc.SelfLink);
   await client.DeleteDocumentCollectionAsync("dbs/mydb/colls/manualindexing");
}

Step 2 − 现在,我们将再次创建与之前相同的两个文档。这一次,由于集合的索引策略,我们不会为马克的文档提供任何特殊请求选项,该文档将不会编制索引。

Step 2 − Now we will again create the same two documents as before. We will not supply any special request options for Mark’s document this time, because of the collection’s indexing policy, this document will not get indexed.

Step 3 − 现在,当我们添加马克的第二个文档时,我们使用带 IndexingDirective.Include 的 RequestOptions 告诉 DocumentDB 它应该索引此文档,这将覆盖集合中所说的不应该索引的索引策略。

Step 3 − Now when we add the second document for Mark, we use RequestOptions with IndexingDirective.Include to tell DocumentDB that it should index this document, which overrides the collection’s indexing policy that says that it shouldn’t.

我们最终为两个文档设置了不同类型的查询。

We have different types of queries for both the documents at the end.

Step 4 − 从 CreateDocumentClient 调用 ManualIndexing 任务。

Step 4 − Let’s call the ManualIndexing task from CreateDocumentClient.

private static async Task CreateDocumentClient() {
   // Create a new instance of the DocumentClient
   using (var client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey)) {
      await ManualIndexing(client);
   }
}

当以上代码编译并执行时,您将收到以下输出。

When the above code is compiled and executed you will receive the following output.

**** Manual Indexing ****
Documents WHERE lastName = 'Upston': 1
All documents: 2
Unindexed document self-link: dbs/kV5oAA==/colls/kV5oANHJPgE=/docs/kV5oANHJPgEBA
AAAAAAAAA==/

同样,该查询仅返回两个文档中的一个,但这一次,它返回简·多伊,我们明确要求对该文档编制索引。但与之前一样,不使用 WHERE 子句进行查询也会检索集合中的所有文档,包括马克的未编制索引的文档。我们还可以按 ID 查询未编制索引的文档,DocumentDB 会返回该文档,即使未对其编制索引。

Again, the query returns only one of the two documents, but this time, it returns Jane Doe, which we explicitly requested to be indexed. But again as before, querying without a WHERE clause retrieves all the documents in the collection, including the unindexed document for Mark. We can also query for the unindexed document by its ID, which DocumentDB returns even though it’s not indexed.