Dynamodb 简明教程

DynamoDB - Scan

扫描操作会读取所有表项或二级索引。其默认功能会返回索引或表中所有项目的全部数据属性。使用 ProjectionExpression 参数对属性进行筛选。

Scan Operations read all table items or secondary indices. Its default function results in returning all data attributes of all items within an index or table. Employ the ProjectionExpression parameter in filtering attributes.

每次扫描都会返回一个结果集,即使没有找到匹配项,也会返回一个空集。扫描检索不到 1MB 的数据,并可以选择对数据进行筛选。

Every scan returns a result set, even on finding no matches, which results in an empty set. Scans retrieve no more than 1MB, with the option to filter data.

Note − 扫描的参数和筛选也适用于查询。

Note − The parameters and filtering of scans also apply to querying.

Types of Scan Operations

Filtering − 扫描操作通过过滤器表达式提供了精细筛选功能,该表达式在扫描或查询后修改数据,在返回结果之前。表达式使用比较运算符。它们的语法与条件表达式的语法类似,但键属性除外,因为过滤器表达式不允许键属性。你不能在过滤器表达式中使用分区键或排序键。

Filtering − Scan operations offer fine filtering through filter expressions, which modify data after scans, or queries; before returning results. The expressions use comparison operators. Their syntax resembles condition expressions with the exception of key attributes, which filter expressions do not permit. You cannot use a partition or sort key in a filter expression.

Note − 在应用任何过滤之前,都适用于 1MB 限制。

Note − The 1MB limit applies prior to any application of filtering.

Throughput Specifications − 扫描会消耗吞吐量,然而,消耗主要关注于项目大小,而不是返回的数据。无论你请求所有属性还是仅仅请求几个属性,消耗都保持不变。另外,使用或不使用过滤器表达式也不会影响消耗。

Throughput Specifications − Scans consume throughput, however, consumption focuses on item size rather than returned data. The consumption remains the same whether you request every attribute or only a few, and using or not using a filter expression also does not impact consumption.

Pagination − DynamoDB 对结果进行分页,从而将结果分成特定页面。1MB 限制适用于返回的结果,当你超过它时,需要进行另一项扫描才能收集其余数据。 LastEvaluatedKey 值允许你执行此后续扫描。只需将该值应用到 ExclusiveStartkey 。当 LastEvaluatedKey 值变为 null 时,该操作已完成所有数据页面。不过,非 null 值并不自动表示还存在更多数据。只有 null 值才表示状态。

Pagination − DynamoDB paginates results causing division of results into specific pages. The 1MB limit applies to returned results, and when you exceed it, another scan becomes necessary to gather the rest of the data. The LastEvaluatedKey value allows you to perform this subsequent scan. Simply apply the value to the ExclusiveStartkey. When the LastEvaluatedKey value becomes null, the operation has completed all pages of data. However, a non-null value does not automatically mean more data remains. Only a null value indicates status.

The Limit Parameter − 限制参数管理结果大小。DynamoDB 使用它来确定在返回数据之前处理的项目数,并且不适用于范围之外。如果你设置值为 x,DynamoDB 将返回前 x 个匹配项。

The Limit Parameter − The limit parameter manages the result size. DynamoDB uses it to establish the number of items to process before returning data, and does not work outside of the scope. If you set a value of x, DynamoDB returns the first x matching items.

在限制参数产生部分结果的情况下,LastEvaluatedKey 值也适用。使用它来完成扫描。

The LastEvaluatedKey value also applies in cases of limit parameters yielding partial results. Use it to complete scans.

Result Count − 对查询和扫描的响应还包括与 ScannedCount 和计数相关的信息,这些信息对扫描/查询的项目进行量化,并对返回的项目进行量化。如果你不进行过滤,它们的值是相同的。当你超过 1MB 时,计数仅表示已处理的部分。

Result Count − Responses to queries and scans also include information related to ScannedCount and Count, which quantify scanned/queried items and quantify items returned. If you do not filter, their values are identical. When you exceed 1MB, the counts represent only the portion processed.

Consistency − 查询结果和扫描结果最终是一致的读取结果,然而,你也可以设置强一致性读取。使用 ConsistentRead 参数更改此设置。

Consistency − Query results and scan results are eventually consistent reads, however, you can set strongly consistent reads as well. Use the ConsistentRead parameter to change this setting.

Note − 一致读取设置会影响消耗,因为在设为强一致性时会使用两倍的容量单位。

Note − Consistent read settings impact consumption by using double the capacity units when set to strongly consistent.

Performance − 查询比扫描提供更好的性能,因为扫描遍历了整个表或辅助索引,从而导致响应迟缓并大量消耗吞吐量。对于小表和过滤较少的搜索,扫描效果最佳。不过,你可以遵循一些最佳实践来设计精简扫描,例如避免突然加速读取活动并利用并行扫描。

Performance − Queries offer better performance than scans due to scans crawling the full table or secondary index, resulting in a sluggish response and heavy throughput consumption. Scans work best for small tables and searches with less filters, however, you can design lean scans by obeying a few best practices such as avoiding sudden, accelerated read activity and exploiting parallel scans.

查询根据给定条件查找某个键范围,其性能取决于它检索的数据量,而不是键的数量。操作参数和匹配项的数量会明确影响性能。

A query finds a certain range of keys satisfying a given condition, with performance dictated by the amount of data it retrieves rather than the volume of keys. The parameters of the operation and the number of matches specifically impact performance.

Parallel Scan

默认情况下,扫描操作顺序执行处理。然后,它们以 1MB 的部分返回数据,这会提示应用程序获取下一部分数据。对于大型表和索引,这会导致长时间扫描。

Scan operations perform processing sequentially by default. Then they return data in 1MB portions, which prompts the application to fetch the next portion. This results in long scans for large tables and indices.

此特性还意味着扫描可能并不总是充分利用可用吞吐量。DynamoDB 将表数据分布到多个分区上,而扫描吞吐量由于它的单分区操作而仅限于一个分区。

This characteristic also means scans may not always fully exploit the available throughput. DynamoDB distributes table data across multiple partitions; and scan throughput remains limited to a single partition due to its single-partition operation.

针对此问题的一个解决方案是从逻辑上将表或索引分成段。然后,“工作程序”并行(同时)扫描段。它使用段和 TotalSegments 的参数来指定由特定工作程序扫描的段并指定处理的段的总数。

A solution for this problem comes from logically dividing tables or indices into segments. Then “workers” parallel (concurrently) scan segments. It uses the parameters of Segment and TotalSegments to specify segments scanned by certain workers and specify the total quantity of segments processed.

Worker Number

你必须尝试不同的工作程序值(段参数)以实现最佳应用程序性能。

You must experiment with worker values (Segment parameter) to achieve the best application performance.

Note − 拥有大量工作程序的并行扫描会影响吞吐量,可能消耗所有吞吐量。通过限制参数管理此问题,你可以使用该参数阻止单个工作程序消耗所有吞吐量。

Note − Parallel scans with large sets of workers impacts throughput by possibly consuming all throughput. Manage this issue with the Limit parameter, which you can use to stop a single worker from consuming all throughput.

以下是深度扫描的示例。

The following is a deep scan example.

Note − 以下程序可能会假设有一个已创建的数据源。在尝试执行之前,获得支持库并创建必要的数据源(具备所需特征的表格或其他引用的源)。

Note − The following program may assume a previously created data source. Before attempting to execute, acquire supporting libraries and create necessary data sources (tables with required characteristics, or other referenced sources).

此示例还使用了 Eclipse IDE、AWS 凭据文件和 Eclipse AWS Java 项目中的 AWS Toolkit。

This example also uses Eclipse IDE, an AWS credentials file, and the AWS Toolkit within an Eclipse AWS Java Project.

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient;
import com.amazonaws.services.dynamodbv2.document.DynamoDB;
import com.amazonaws.services.dynamodbv2.document.Item;
import com.amazonaws.services.dynamodbv2.document.ItemCollection;
import com.amazonaws.services.dynamodbv2.document.ScanOutcome;
import com.amazonaws.services.dynamodbv2.document.Table;

public class ScanOpSample {
   static DynamoDB dynamoDB = new DynamoDB(
      new AmazonDynamoDBClient(new ProfileCredentialsProvider()));
   static String tableName = "ProductList";

   public static void main(String[] args) throws Exception {
      findProductsUnderOneHun();                       //finds products under 100 dollars
   }
   private static void findProductsUnderOneHun() {
      Table table = dynamoDB.getTable(tableName);
      Map<String, Object> expressionAttributeValues = new HashMap<String, Object>();
      expressionAttributeValues.put(":pr", 100);

      ItemCollection<ScanOutcome> items = table.scan (
         "Price < :pr",                                  //FilterExpression
         "ID, Nomenclature, ProductCategory, Price",     //ProjectionExpression
         null,                                           //No ExpressionAttributeNames
         expressionAttributeValues);

      System.out.println("Scanned " + tableName + " to find items under $100.");
      Iterator<Item> iterator = items.iterator();

      while (iterator.hasNext()) {
         System.out.println(iterator.next().toJSONPretty());
      }
   }
}