Frequently Asked Questions
Is it possible to execute jobs in multiple threads or multiple processes?
有三种方法可以解决这一点——但我们建议在分析此类要求时保持谨慎(真的有必要吗?)。
There are three ways to approach this - but we recommend exercising caution in the analysis of such requirements (is it really necessary?).
-
Add a
TaskExecutor
to the step. The `StepBuilder`s provided for configuring Steps have a "taskExecutor" property you can set.This works as long as the step is intrinsically restartable (idempotent effectively). The parallel job sample shows how it might work in practice - this uses a "process indicator" pattern to mark input records as complete, inside the business transaction. -
Use the
PartitionStep
to split your step execution explicitly amongst several Step instances. Spring Batch has a local multi-threaded implementation of the main strategy for this (PartitionHandler
), which makes it a great choice for IO intensive jobs. Remember to usescope="step"
for the stateful components in a step executing in this fashion, so that separate instances are created per step execution, and there is no cross talk between threads. -
Use the Remote Chunking approach as implemented in the
spring-batch-integration
module. This requires some durable middleware (e.g. JMS) for reliable communication between the driving step and the remote workers. The basic idea is to use a specialItemWriter
on the driving process, and a listener pattern on the worker processes (via aChunkProcessor
).
How can I make an item reader thread safe?
您可以同步“read()”方法(例如,通过将其包装在执行同步的委托程序中)。请记住,您会失去可重新启动性,因此最佳实践是将该步骤标记为不可重新启动,为了安全(且高效),您还可以对读取器设置“saveState=false”。
You can synchronize the read()
method (e.g. by wrapping it in a delegator that does the synchronization).
Remember that you will lose restartability, so best practice is to mark the step as not restartable and to be safe (and efficient) you can also set saveState=false
on the reader.
What is the Spring Batch philosophy on the use of flexible strategies and default implementations? Can you add a public getter for this or that property?
对于框架开发者(相对于业务逻辑的实现者)而言,Spring 批处理中有许多扩展点。我们期待客户端创建自己的特定策略,以将这些策略插入以控制提交间隔(CompletionPolicy
)、如何处理异常的规则(ExceptionHandler
)以及许多其他策略。
There are many extension points in Spring Batch for the framework developer (as opposed to the implementor of business logic).
We expect clients to create their own more specific strategies that can be plugged in to control things like commit intervals ( CompletionPolicy
),
rules about how to deal with exceptions ( ExceptionHandler
), and many others.
通常情况下,我们会劝阻用户扩展框架类。Java 语言没有给我们标记类和接口为内部的灵活性。通常而言,你可以期望源树顶级结构的如下包中的任何内容为公开内容,但并不一定具有子类:org.springframework.batch.*
。不鼓励扩展我们对大多数策略的具体实现,而更鼓励使用组合或分支方法。如果你的代码只使用 Spring 批处理中的这些接口,那将会为你带来最大的可移植性。
In general we try to dissuade users from extending framework classes. The Java language doesn’t give us as much flexibility to mark classes and interfaces as internal.
Generally you can expect anything at the top level of the source tree in packages org.springframework.batch.*
to be public, but not necessarily sub-classable.
Extending our concrete implementations of most strategies is discouraged in favour of a composition or forking approach.
If your code can use only the interfaces from Spring Batch, that gives you the greatest possible portability.
How does Spring Batch differ from Quartz? Is there a place for them both in a solution?
Spring 批处理和 Quartz 有不同的目标。Spring 批处理提供处理大量数据的功能,而 Quartz 提供任务计划功能。所以 Quartz 可以补充 Spring 批处理,但它们并不是排斥的技术。一种常见的组合方式是使用 Quartz,以此来作为使用 Cron 表达式和 Spring Core 便利工具 SchedulerFactoryBean
的 Spring 批处理作业的触发器。
Spring Batch and Quartz have different goals. Spring Batch provides functionality for processing large volumes of data and Quartz provides functionality for scheduling tasks.
So Quartz could complement Spring Batch, but are not excluding technologies. A common combination would be to use Quartz as a trigger for a Spring Batch job using a Cron expression
and the Spring Core convenience SchedulerFactoryBean
.
How do I schedule a job with Spring Batch?
使用计划工具。有很多这样的工具。例如:Quartz、Control-M、Autosys。Quartz 没有 Control-M 或 Autosys 的所有功能 - 它应该是轻量级的。如果你想要更轻量级的东西,你可以使用操作系统(cron
、at
等)。
Use a scheduling tool. There are plenty of them out there. Examples: Quartz, Control-M, Autosys.
Quartz doesn’t have all the features of Control-M or Autosys - it is supposed to be lightweight.
If you want something even more lightweight you can just use the OS (cron
, at
, etc.).
可以使用 Spring 批处理的作业步骤模型和 Spring 批处理中的非顺序特性来实现简单的顺序相关项。我们认为这种方法很常见。实际上,它可以帮助我们更轻松地纠正计划程序常见的错误使用方式 - 配置成百上千个作业,其中许多作业之间并非独立,而只是相互依赖。
Simple sequential dependencies can be implemented using the job-steps model of Spring Batch, and the non-sequential features in Spring Batch. We think this is quite common. And in fact it makes it easier to correct a common mis-use of schedulers - having hundreds of jobs configured, many of which are not independent, but only depend on one other.
How does Spring Batch allow project to optimize for performance and scalability (through parallel processing or other)?
我们将其视为 Job
或 Step
的作用之一。Step 的特定实现方式处理业务逻辑的分解以及在并行进程或处理器之间高效地共享此逻辑的问题(参见 PartitionStep
)。这里有许多技术可以发挥作用。其实质上只是一组并发远程调用,这些调用会处理某些业务处理,并可以将其分发到分布式代理。由于业务处理通常已经是模块化的(例如输入一个项目并对其进行处理),Spring 批处理可以用多种方式制定分布策略。我们使用过的一种实现方案是一组处理业务处理的远程 Web 服务。我们将数字主键范围发送给每个远程调用的输入。在执行层配置中,同样的基本策略适用于任何 Spring 远程通信协议(普通的 RMI、HttpInvoker、JMS、Hessian 等),修改几行即可。
We see this as one of the roles of the Job
or Step
. A specific implementation of the Step deals with the concern of breaking apart the business logic
and sharing it efficiently between parallel processes or processors (see PartitionStep
). There are a number of technologies that could play a role here.
The essence is just a set of concurrent remote calls to distributed agents that can handle some business processing.
Since the business processing is already typically modularised - e.g. input an item, process it - Spring Batch can strategise the distribution in a number of ways.
One implementation that we have had some experience with is a set of remote web services handling the business processing.
We send a specific range of primary keys for the inputs to each of a number of remote calls.
The same basic strategy would work with any of the Spring Remoting protocols (plain RMI, HttpInvoker, JMS, Hessian etc.) with little more than a couple of lines change
in the execution layer configuration.
How can messaging be used to scale batch architectures?
许多现有项目实践证明,流水线式批处理方法极有帮助,它会导致更高的恢复能力和吞吐量。我们经常面临任务攸关的应用,需要审计跟踪,还需要保证处理,但此类应用在负载下的性能限制极高,或者高吞吐量具有竞争优势。
There is a good deal of practical evidence from existing projects that a pipeline approach to batch processing is highly beneficial, leading to resilience and high throughput. We are often faced with mission-critical applications where audit trails are essential, and guaranteed processing is demanded, but where there are extremely tight limits on performance under load, or where high throughput gives a competitive advantage.
Matt Welsh 的研究表明,分阶段事件驱动架构 (SEDA) 比更严格的处理架构具有极大的优势,消息导向中间件(JMS、AQ、MQ、Tibco 等)提供了许多现成可用且极具恢复能力的功能。在向下游和上游阶段之间存在反馈的系统中具有特定的好处,因此可以调整使用者的数量来考虑到需求量。那么这如何融入 Spring 批处理中呢?spring-batch-integration 项目在 Spring Integration 中实现了此模式,并且可以用来放大远程处理任何拥有多个要处理的项目的步骤。尤其请参阅“块”包,以及其中的 ItemWriter
和 ChunkHandler
实现。
Matt Welsh’s work shows that a Staged Event Driven Architecture (SEDA) has enormous benefits over more rigid processing architectures,
and message-oriented middleware (JMS, AQ, MQ, Tibco etc.) gives us a lot of resilience out of the box. There are particular benefits in
a system where there is feedback between downstream and upstream stages, so the number of consumers can be adjusted to account for the amount of demand.
So how does this fit into Spring Batch? The spring-batch-integration project has this pattern implemented in Spring Integration,
and can be used to scale up the remote processing of any step with many items to process.
See in particular the "chunk" package, and the ItemWriter
and ChunkHandler
implementations in there.