JDBC Batch Operations

如果你对同一准备语句批量执行多个调用,大多数 JDBC 驱动程序都会提供更好的性能。通过将更新分组到批处理中,你可以限制往返于数据库的次数。

Most JDBC drivers provide improved performance if you batch multiple calls to the same prepared statement. By grouping updates into batches, you limit the number of round trips to the database.

Basic Batch Operations with JdbcTemplate

你可以通过实现特殊接口 BatchPreparedStatementSetter 的两种方法并将其作为 batchUpdate 方法调用的第二个参数,来完成 JdbcTemplate 批处理。你可以使用 getBatchSize 方法来提供当前批处理的大小。你可以使用 setValues 方法来设置准备语句的参数值。这个方法被调用了 getBatchSize 调用中你指定过的次数。以下示例基于列表中的条目更新 t_actor 表,并使用整个列表作为批处理:

You accomplish JdbcTemplate batch processing by implementing two methods of a special interface, BatchPreparedStatementSetter, and passing that implementation in as the second parameter in your batchUpdate method call. You can use the getBatchSize method to provide the size of the current batch. You can use the setValues method to set the values for the parameters of the prepared statement. This method is called the number of times that you specified in the getBatchSize call. The following example updates the t_actor table based on entries in a list, and the entire list is used as the batch:

  • Java

  • Kotlin

public class JdbcActorDao implements ActorDao {

	private JdbcTemplate jdbcTemplate;

	public void setDataSource(DataSource dataSource) {
		this.jdbcTemplate = new JdbcTemplate(dataSource);
	}

	public int[] batchUpdate(final List<Actor> actors) {
		return this.jdbcTemplate.batchUpdate(
				"update t_actor set first_name = ?, last_name = ? where id = ?",
				new BatchPreparedStatementSetter() {
					public void setValues(PreparedStatement ps, int i) throws SQLException {
						Actor actor = actors.get(i);
						ps.setString(1, actor.getFirstName());
						ps.setString(2, actor.getLastName());
						ps.setLong(3, actor.getId().longValue());
					}
					public int getBatchSize() {
						return actors.size();
					}
				});
	}

	// ... additional methods
}
class JdbcActorDao(dataSource: DataSource) : ActorDao {

	private val jdbcTemplate = JdbcTemplate(dataSource)

	fun batchUpdate(actors: List<Actor>): IntArray {
		return jdbcTemplate.batchUpdate(
				"update t_actor set first_name = ?, last_name = ? where id = ?",
				object: BatchPreparedStatementSetter {
					override fun setValues(ps: PreparedStatement, i: Int) {
						ps.setString(1, actors[i].firstName)
						ps.setString(2, actors[i].lastName)
						ps.setLong(3, actors[i].id)
					}

					override fun getBatchSize() = actors.size
				})
	}

	// ... additional methods
}

如果你处理一个更新流或从一个文件读取,你可能有一个首选的批处理大小,但最后一个批处理可能没有那么多的条目。在这种情况下,你可以使用 InterruptibleBatchPreparedStatementSetter 接口,它允许你在输入源耗尽后中断批处理。isBatchExhausted 方法让你发出批处理结束的信号。

If you process a stream of updates or reading from a file, you might have a preferred batch size, but the last batch might not have that number of entries. In this case, you can use the InterruptibleBatchPreparedStatementSetter interface, which lets you interrupt a batch once the input source is exhausted. The isBatchExhausted method lets you signal the end of the batch.

Batch Operations with a List of Objects

JdbcTemplateNamedParameterJdbcTemplate 都提供了一种提供批处理更新的替代方法。你不需要实现一个特殊的批处理接口,而是提供所有参数值作为调用中的一个列表。框架循环遍历这些值并使用一个内部准备语句设置器。API 会根据你是否使用命名参数而有所不同。对于命名参数,你提供一个 SqlParameterSource 数组,批处理的每个成员一个条目。你可以使用 SqlParameterSourceUtils.createBatch 方便方法来创建此数组,它可以传递一个 Bean 式对象的数组(带有与参数相对应的 getter 方法)、String 键控的 Map 实例(包含相应参数作为值)、或两者结合。

Both the JdbcTemplate and the NamedParameterJdbcTemplate provides an alternate way of providing the batch update. Instead of implementing a special batch interface, you provide all parameter values in the call as a list. The framework loops over these values and uses an internal prepared statement setter. The API varies, depending on whether you use named parameters. For the named parameters, you provide an array of SqlParameterSource, one entry for each member of the batch. You can use the SqlParameterSourceUtils.createBatch convenience methods to create this array, passing in an array of bean-style objects (with getter methods corresponding to parameters), String-keyed Map instances (containing the corresponding parameters as values), or a mix of both.

以下示例演示使用命名参数的批处理更新:

The following example shows a batch update using named parameters:

  • Java

  • Kotlin

public class JdbcActorDao implements ActorDao {

	private NamedParameterTemplate namedParameterJdbcTemplate;

	public void setDataSource(DataSource dataSource) {
		this.namedParameterJdbcTemplate = new NamedParameterJdbcTemplate(dataSource);
	}

	public int[] batchUpdate(List<Actor> actors) {
		return this.namedParameterJdbcTemplate.batchUpdate(
				"update t_actor set first_name = :firstName, last_name = :lastName where id = :id",
				SqlParameterSourceUtils.createBatch(actors));
	}

	// ... additional methods
}
class JdbcActorDao(dataSource: DataSource) : ActorDao {

	private val namedParameterJdbcTemplate = NamedParameterJdbcTemplate(dataSource)

	fun batchUpdate(actors: List<Actor>): IntArray {
		return this.namedParameterJdbcTemplate.batchUpdate(
				"update t_actor set first_name = :firstName, last_name = :lastName where id = :id",
				SqlParameterSourceUtils.createBatch(actors));
	}

		// ... additional methods
}

对于使用经典 ? 占位符的 SQL 语句,你传递一个包含更新值的 Object 数组的列表。这个 Object 数组必须为 SQL 语句中的每个占位符有一个条目,并且它们必须按在 SQL 语句中定义的顺序排列。

For an SQL statement that uses the classic ? placeholders, you pass in a list containing an object array with the update values. This object array must have one entry for each placeholder in the SQL statement, and they must be in the same order as they are defined in the SQL statement.

以下示例与前一个示例相同,只不过它使用经典的 JDBC ? 占位符:

The following example is the same as the preceding example, except that it uses classic JDBC ? placeholders:

  • Java

  • Kotlin

public class JdbcActorDao implements ActorDao {

	private JdbcTemplate jdbcTemplate;

	public void setDataSource(DataSource dataSource) {
		this.jdbcTemplate = new JdbcTemplate(dataSource);
	}

	public int[] batchUpdate(final List<Actor> actors) {
		List<Object[]> batch = new ArrayList<>();
		for (Actor actor : actors) {
			Object[] values = new Object[] {
					actor.getFirstName(), actor.getLastName(), actor.getId()};
			batch.add(values);
		}
		return this.jdbcTemplate.batchUpdate(
				"update t_actor set first_name = ?, last_name = ? where id = ?",
				batch);
	}

	// ... additional methods
}
class JdbcActorDao(dataSource: DataSource) : ActorDao {

	private val jdbcTemplate = JdbcTemplate(dataSource)

	fun batchUpdate(actors: List<Actor>): IntArray {
		val batch = mutableListOf<Array<Any>>()
		for (actor in actors) {
			batch.add(arrayOf(actor.firstName, actor.lastName, actor.id))
		}
		return jdbcTemplate.batchUpdate(
				"update t_actor set first_name = ?, last_name = ? where id = ?", batch)
	}

	// ... additional methods
}

我们之前描述的所有批处理更新方法都返回一个包含每个批处理条目的受影响行数的 int 数组。此计数由 JDBC 驱动程序报告。如果计数不可用,JDBC 驱动程序返回 -2

All of the batch update methods that we described earlier return an int array containing the number of affected rows for each batch entry. This count is reported by the JDBC driver. If the count is not available, the JDBC driver returns a value of -2.

在这样的情况下,在底层`PreparedStatement`上自动设置值时,每个值的相应JDBC类型需要从给定的Java类型中推演出。虽然这通常能很好地发挥作用,但可能会出现问题(例如使用Map所包含的`null`值),在这种情况下,Spring默认调用`ParameterMetaData.getParameterType`,而这可能会给你的JDBC驱动增加工作量。如果你遇到性能问题(据Oracle 12c、JBoss和PostgreSQL报告),则应当使用最近的驱动程序版本并考虑将`spring.jdbc.getParameterType.ignore`属性设置为`true`(作为JVM系统属性或通过xref:appendix.adoc#appendix-spring-properties[`SpringProperties`机制)。

In such a scenario, with automatic setting of values on an underlying PreparedStatement, the corresponding JDBC type for each value needs to be derived from the given Java type. While this usually works well, there is a potential for issues (for example, with Map-contained null values). Spring, by default, calls ParameterMetaData.getParameterType in such a case, which can be expensive with your JDBC driver. You should use a recent driver version and consider setting the spring.jdbc.getParameterType.ignore property to true (as a JVM system property or via the SpringProperties mechanism) if you encounter a performance issue (as reported on Oracle 12c, JBoss, and PostgreSQL).

或者,您可以考虑显式地指定 JDBC 类型,通过 BatchPreparedStatementSetter(如前面所示)、通过给基于 List<Object[]> 的调用提供的明确类型数组、通过 registerSqlType 在自定义 MapSqlParameterSource 实例上调用或通过 BeanPropertySqlParameterSource 来衍生自 Java 声明的属性类型,即使为空值也是如此,从而获得 SQL 类型。

Alternatively, you might consider specifying the corresponding JDBC types explicitly, either through a BatchPreparedStatementSetter (as shown earlier), through an explicit type array given to a List<Object[]> based call, through registerSqlType calls on a custom MapSqlParameterSource instance, or through a BeanPropertySqlParameterSource that derives the SQL type from the Java-declared property type even for a null value.

Batch Operations with Multiple Batches

前面批处理更新的示例处理的批处理很大,你希望将它们分成几个较小的批处理。你可以通过使用前面提到的方法多次调用 batchUpdate 方法来实现此操作,但现在有一个更方便的方法。除了 SQL 语句之外,此方法还接收一个包含参数的对象 Collection、每个批处理要执行的更新次数,以及一个 ParameterizedPreparedStatementSetter 来设置准备 statement 的参数值。框架循环遍历提供的值,并将更新调用分解为指定大小的批处理。

The preceding example of a batch update deals with batches that are so large that you want to break them up into several smaller batches. You can do this with the methods mentioned earlier by making multiple calls to the batchUpdate method, but there is now a more convenient method. This method takes, in addition to the SQL statement, a Collection of objects that contain the parameters, the number of updates to make for each batch, and a ParameterizedPreparedStatementSetter to set the values for the parameters of the prepared statement. The framework loops over the provided values and breaks the update calls into batches of the size specified.

以下示例展示了使用 100 的批量大小的批量更新:

The following example shows a batch update that uses a batch size of 100:

  • Java

  • Kotlin

public class JdbcActorDao implements ActorDao {

	private JdbcTemplate jdbcTemplate;

	public void setDataSource(DataSource dataSource) {
		this.jdbcTemplate = new JdbcTemplate(dataSource);
	}

	public int[][] batchUpdate(final Collection<Actor> actors) {
		int[][] updateCounts = jdbcTemplate.batchUpdate(
				"update t_actor set first_name = ?, last_name = ? where id = ?",
				actors,
				100,
				(PreparedStatement ps, Actor actor) -> {
					ps.setString(1, actor.getFirstName());
					ps.setString(2, actor.getLastName());
					ps.setLong(3, actor.getId().longValue());
				});
		return updateCounts;
	}

	// ... additional methods
}
class JdbcActorDao(dataSource: DataSource) : ActorDao {

	private val jdbcTemplate = JdbcTemplate(dataSource)

	fun batchUpdate(actors: List<Actor>): Array<IntArray> {
		return jdbcTemplate.batchUpdate(
					"update t_actor set first_name = ?, last_name = ? where id = ?",
					actors, 100) { ps, argument ->
			ps.setString(1, argument.firstName)
			ps.setString(2, argument.lastName)
			ps.setLong(3, argument.id)
		}
	}

	// ... additional methods
}

该调用对此批量更新方法返回的是一个 int 数组,它包括一个针对每次更新返回受影响行数的数组的数组条目。顶层数组长度指示运行的批量的数量,而第二层数组长度指示该批量的更新数量。如果提供的更新对象的总数量根据批量大小,每个批量的更新数量应是所有批量的批量大小(只有最后一个可能少于)。每次更新语句的更新计数是由 JDBC 驱动程序报告的那个。如果计数不可用,JDBC 驱动程序会返回 -2 值。

The batch update method for this call returns an array of int arrays that contains an array entry for each batch with an array of the number of affected rows for each update. The top-level array’s length indicates the number of batches run, and the second level array’s length indicates the number of updates in that batch. The number of updates in each batch should be the batch size provided for all batches (except that the last one that might be less), depending on the total number of update objects provided. The update count for each update statement is the one reported by the JDBC driver. If the count is not available, the JDBC driver returns a value of -2.