The Domain Language of Batch

对于任何一位经验丰富的批处理架构师来说,Spring Batch 中所使用的批处理的基本概念都应该是熟悉且得心应手的。有“作业”和“步骤”,以及开发者提供的称为 ItemReaderItemWriter 的处理单元。然而,由于 Spring 模式、操作、模板、回调函数和习语,以下事项将有机会:

To any experienced batch architect, the overall concepts of batch processing used in Spring Batch should be familiar and comfortable. There are “Jobs” and “Steps” and developer-supplied processing units called ItemReader and ItemWriter. However, because of the Spring patterns, operations, templates, callbacks, and idioms, there are opportunities for the following:

  • Significant improvement in adherence to a clear separation of concerns.

  • Clearly delineated architectural layers and services provided as interfaces.

  • Simple and default implementations that allow for quick adoption and ease of use out of the box.

  • Significantly enhanced extensibility.

下图是一个简化的批处理参考架构版本,该架构已使用数十年。它概述了构成批处理领域语言的组件。此架构框架是一个蓝图,它已通过对上一代平台(大型机上的 COBOL、Unix 上的 C 和现在任何地方的 Java)的数十年实施得到验证。JCL 和 COBOL 开发者可能会像 C、C# 和 Java 开发者一样熟悉这些概念。Spring Batch 提供了对在用于解决简单到复杂批处理应用程序的创建的健壮、可维护系统中通常发现的层、组件和技术服务的物理实现,以及用于解决非常复杂的处理需求的基础设施和扩展。

The following diagram is a simplified version of the batch reference architecture that has been used for decades. It provides an overview of the components that make up the domain language of batch processing. This architecture framework is a blueprint that has been proven through decades of implementations on the last several generations of platforms (COBOL on mainframes, C on Unix, and now Java anywhere). JCL and COBOL developers are likely to be as comfortable with the concepts as C, C#, and Java developers. Spring Batch provides a physical implementation of the layers, components, and technical services commonly found in the robust, maintainable systems that are used to address the creation of simple to complex batch applications, with the infrastructure and extensions to address very complex processing needs. .Batch Stereotypes image::spring-batch-reference-model.png[]

上图重点介绍了构成 Spring Batch 领域语言的关键概念。一个 “作业”包含一步或多步,每一步恰好有一个 ItemReader、一个 ItemProcessor 和一个 ItemWriter。一个作业需要启动(使用 JobLauncher),并且需要存储有关当前运行过程的元数据(在 JobRepository 中)。

The preceding diagram highlights the key concepts that make up the domain language of Spring Batch. A Job has one to many steps, each of which has exactly one ItemReader, one ItemProcessor, and one ItemWriter. A job needs to be launched (with JobLauncher), and metadata about the currently running process needs to be stored (in JobRepository).

Job

本部分描述与批处理作业概念相关的陈规定型观念。“作业”是一个实体,它封装整个批处理过程。与其他 Spring 项目一样,“作业”通过 XML 配置文件或基于 Java 的配置连接在一起。此配置可以称为“作业配置”。但正如下图所示,“作业”仅仅是一个整体层次结构的顶部:

This section describes stereotypes relating to the concept of a batch job. A Job is an entity that encapsulates an entire batch process. As is common with other Spring projects, a Job is wired together with either an XML configuration file or Java-based configuration. This configuration may be referred to as the “job configuration”. However, Job is only the top of an overall hierarchy, as shown in the following diagram:

job heirarchy
Figure 1. Job Hierarchy

在 Spring Batch 中,“作业”仅仅是 Step 实例的容器。它将逻辑上属于一个流程中的多个步骤结合在一起,并且允许为所有步骤进行全局的属性配置,例如可重新启动性。作业配置包含:

In Spring Batch, a Job is simply a container for Step instances. It combines multiple steps that logically belong together in a flow and allows for configuration of properties global to all steps, such as restartability. The job configuration contains:

  • The name of the job.

  • Definition and ordering of Step instances.

  • Whether or not the job is restartable.

Java

对于使用 Java 配置的用户,Spring Batch 以 SimpleJob 类为形式提供了一个 Job 接口的默认实现,它在 Job 之上创建了一些标准功能。在使用基于 Java 的配置时,提供一组构建器供实例化一个 Job,如下面的例子所示:

For those who use Java configuration, Spring Batch provides a default implementation of the Job interface in the form of the SimpleJob class, which creates some standard functionality on top of Job. When using Java-based configuration, a collection of builders is made available for the instantiation of a Job, as the following example shows:

@Bean
public Job footballJob(JobRepository jobRepository) {
    return new JobBuilder("footballJob", jobRepository)
                     .start(playerLoad())
                     .next(gameLoad())
                     .next(playerSummarization())
                     .build();
}
XML

对于使用 XML 配置的用户,Spring Batch 以 SimpleJob 类为形式提供了一个 Job 接口的默认实现,它在 Job 之上创建了一些标准功能。然而,批处理命名空间免除了直接实例化它的需要。相反,你可以使用 <job> 元素,如下面的例子所示:

For those who use XML configuration, Spring Batch provides a default implementation of the Job interface in the form of the SimpleJob class, which creates some standard functionality on top of Job. However, the batch namespace abstracts away the need to instantiate it directly. Instead, you can use the <job> element, as the following example shows:

<job id="footballJob">
    <step id="playerload" next="gameLoad"/>
    <step id="gameLoad" next="playerSummarization"/>
    <step id="playerSummarization"/>
</job>

JobInstance

“作业实例”指的是逻辑作业运行的概念。考虑一个每天结束时应该运行一次的批处理作业,例如前面图表中的 EndOfDay 作业。有一个 EndOfDay 作业,但是作业的每次单独运行都必须单独跟踪。在作业的情况下,每天有一个逻辑的 JobInstance。例如,有一个 1 月 1 日的运行、一个 1 月 2 日的运行,依此类推。如果 1 月 1 日的运行第一次失败,并且第二天再次运行,它仍然是 1 月 1 日的运行。(通常,这也对应于它正在处理的数据,即 1 月 1 日的运行处理 1 月 1 日的数据)。因此,每个 JobInstance 可以有多个执行(JobExecution 将在本章后面进行更详细的讨论),并且给定时间只能运行一个 JobInstance(对应于一个特定的 Job 和标识 JobParameters)。

A JobInstance refers to the concept of a logical job run. Consider a batch job that should be run once at the end of the day, such as the EndOfDay Job from the preceding diagram. There is one EndOfDay job, but each individual run of the Job must be tracked separately. In the case of this job, there is one logical JobInstance per day. For example, there is a January 1st run, a January 2nd run, and so on. If the January 1st run fails the first time and is run again the next day, it is still the January 1st run. (Usually, this corresponds with the data it is processing as well, meaning the January 1st run processes data for January 1st). Therefore, each JobInstance can have multiple executions (JobExecution is discussed in more detail later in this chapter), and only one JobInstance (which corresponds to a particular Job and identifying JobParameters) can run at a given time.

“作业实例”的定义绝对不会对要加载的数据产生影响。完全由 ItemReader 实现决定如何加载数据。例如,在 EndOfDay 场景中,数据上可能有一列表示数据所属的“生效日期”或“预定日期”。因此,1 月 1 日的运行将只加载 1 号的数据,而 1 月 2 日的运行将只使用 2 号的数据。因为此确定很可能是一项业务决策,所以由 ItemReader 决定。然而,使用同样的 JobInstance 决定是否使用以前执行的“状态”(即执行上下文,将在本章后面讨论)。使用一个新的 JobInstance 意味着“从头开始”,而使用现有的实例通常意味着“从中断处开始”。

The definition of a JobInstance has absolutely no bearing on the data to be loaded. It is entirely up to the ItemReader implementation to determine how data is loaded. For example, in the EndOfDay scenario, there may be a column on the data that indicates the effective date or schedule date to which the data belongs. So, the January 1st run would load only data from the 1st, and the January 2nd run would use only data from the 2nd. Because this determination is likely to be a business decision, it is left up to the ItemReader to decide. However, using the same JobInstance determines whether or not the “state” (that is, the ExecutionContext, which is discussed later in this chapter) from previous executions is used. Using a new JobInstance means “start from the beginning,” and using an existing instance generally means “start from where you left off”.

JobParameters

在讨论了 JobInstance 及其与 Job 的不同之处之后,接下来自然而然就会问:“一个 JobInstance 如何与另一个 JobInstance 区别开来?”答案是:JobParametersJobParameters 对象包含用于启动一个批处理作业的一组参数。它们可以用于标识,甚至可以在运行期间作为参考数据,如下面的图片所示:

Having discussed JobInstance and how it differs from Job, the natural question to ask is: “How is one JobInstance distinguished from another?” The answer is: JobParameters. A JobParameters object holds a set of parameters used to start a batch job. They can be used for identification or even as reference data during the run, as the following image shows:

job stereotypes parameters
Figure 2. Job Parameters

在前面的例子中,有两个实例,一个用于 1 月 1 日,另一个用于 1 月 2 日,但实际上只有一个 Job,但它有两个 JobParameter 对象:一个以 2017 年 1 月 1 日的作业参数启动,另一个以 2017 年 1 月 2 日的参数启动。因此,可以将契约定义为:JobInstance = Job + 标识 JobParameters。这允许开发人员有效地控制如何定义一个 JobInstance,因为他们控制传入哪些参数。

In the preceding example, where there are two instances, one for January 1st and another for January 2nd, there is really only one Job, but it has two JobParameter objects: one that was started with a job parameter of 01-01-2017 and another that was started with a parameter of 01-02-2017. Thus, the contract can be defined as: JobInstance = Job + identifying JobParameters. This allows a developer to effectively control how a JobInstance is defined, since they control what parameters are passed in.

并非所有作业参数都需要有助于识别 JobInstance。默认情况下,它们这样做。但是,该框架还允许使用不有助于 JobInstance 身份的参数提交 Job

Not all job parameters are required to contribute to the identification of a JobInstance. By default, they do so. However, the framework also allows the submission of a Job with parameters that do not contribute to the identity of a JobInstance.

JobExecution

“作业执行”指的是在技术上尝试运行作业一次。执行可能会以失败或成功而结束,但是对应于给定执行的 JobInstance 不被视为完整,除非执行成功完成。使用前面描述的 EndOfDay 作业作为示例,考虑一个 2017 年 1 月 1 日的 JobInstance,它在第一次运行时失败。如果它使用与第一次运行相同的标识作业参数(2017 年 1 月 1 日)再次运行,则创建一个新的 JobExecution。然而,仍然只有一个 JobInstance

A JobExecution refers to the technical concept of a single attempt to run a Job. An execution may end in failure or success, but the JobInstance corresponding to a given execution is not considered to be complete unless the execution completes successfully. Using the EndOfDay Job described previously as an example, consider a JobInstance for 01-01-2017 that failed the first time it was run. If it is run again with the same identifying job parameters as the first run (01-01-2017), a new JobExecution is created. However, there is still only one JobInstance.

“作业”定义了作业是什么以及如何执行作业,“作业实例”是一个纯组织对象,用于将执行分组在一起,主要是为了启用正确的重启语义。“作业执行”则是用于存储运行期间实际发生的事情的主要机制,并且包含许多必须控制和保留的更多属性,如下表所示:

A Job defines what a job is and how it is to be executed, and a JobInstance is a purely organizational object to group executions together, primarily to enable correct restart semantics. A JobExecution, however, is the primary storage mechanism for what actually happened during a run and contains many more properties that must be controlled and persisted, as the following table shows:

Table 1. JobExecution Properties

Property

Definition

Status

A BatchStatus object that indicates the status of the execution. While running, it is BatchStatus#STARTED. If it fails, it is BatchStatus#FAILED. If it finishes successfully, it is BatchStatus#COMPLETED

startTime

A java.time.LocalDateTime representing the current system time when the execution was started. This field is empty if the job has yet to start.

endTime

A java.time.LocalDateTime representing the current system time when the execution finished, regardless of whether or not it was successful. The field is empty if the job has yet to finish.

exitStatus

The ExitStatus, indicating the result of the run. It is most important, because it contains an exit code that is returned to the caller. See chapter 5 for more details. The field is empty if the job has yet to finish.

createTime

A java.time.LocalDateTime representing the current system time when the JobExecution was first persisted. The job may not have been started yet (and thus has no start time), but it always has a createTime, which is required by the framework for managing job-level ExecutionContexts.

lastUpdated

A java.time.LocalDateTime representing the last time a JobExecution was persisted. This field is empty if the job has yet to start.

executionContext

The “property bag” containing any user data that needs to be persisted between executions.

failureExceptions

The list of exceptions encountered during the execution of a Job. These can be useful if more than one exception is encountered during the failure of a Job.

这些属性很重要,因为它们是持久的并且可以用来完全确定执行的状态。例如,如果 2017 年 1 月 1 日的 EndOfDay 作业在晚上 9:00 执行,并在 9:30 失败,则在批处理元数据表中将创建以下条目:

These properties are important because they are persisted and can be used to completely determine the status of an execution. For example, if the EndOfDay job for 01-01 is executed at 9:00 PM and fails at 9:30, the following entries are made in the batch metadata tables:

Table 2. BATCH_JOB_INSTANCE

JOB_INST_ID

JOB_NAME

1

EndOfDayJob

Table 3. BATCH_JOB_EXECUTION_PARAMS

JOB_EXECUTION_ID

TYPE_CD

KEY_NAME

DATE_VAL

IDENTIFYING

1

DATE

schedule.Date

2017-01-01

TRUE

Table 4. BATCH_JOB_EXECUTION

JOB_EXEC_ID

JOB_INST_ID

START_TIME

END_TIME

STATUS

1

1

2017-01-01 21:00

2017-01-01 21:30

FAILED

列名称可能已缩写或删除,以提高清晰度和格式。

Column names may have been abbreviated or removed for the sake of clarity and formatting.

现在作业已失败,假设问题确定花费了整晚的时间,因此“批处理窗口”现在已关闭。进一步假设窗口从晚上 9:00 开始,作业再次为 2017 年 1 月 1 日启动,从中断处开始,并在 9:30 成功完成。由于现在是第二天,因此 2017 年 1 月 2 日的作业也必须运行,并且它在 9:31 之后立即启动,并在正常的一小时时间内于 10:30 完成。除非两个作业有可能尝试访问相同的数据,从而导致数据库级别锁定问题,否则不需要在一个作业实例启动后立即启动另一个作业实例。由调度程序完全决定何时应该运行一个作业。由于它们是独立的 JobInstance,因此 Spring Batch 不会尝试阻止它们同时运行。(尝试在另一个作业正在运行时运行同一个 JobInstance 将导致抛出 JobExecutionAlreadyRunningException)。现在 JobInstanceJobParameters 表中应有两个额外的条目,并且 JobExecution 表中应有两个额外的条目,如下表所示:

Now that the job has failed, assume that it took the entire night for the problem to be determined, so that the “batch window” is now closed. Further assuming that the window starts at 9:00 PM, the job is kicked off again for 01-01, starting where it left off and completing successfully at 9:30. Because it is now the next day, the 01-02 job must be run as well, and it is kicked off just afterwards at 9:31 and completes in its normal one hour time at 10:30. There is no requirement that one JobInstance be kicked off after another, unless there is potential for the two jobs to attempt to access the same data, causing issues with locking at the database level. It is entirely up to the scheduler to determine when a Job should be run. Since they are separate JobInstances, Spring Batch makes no attempt to stop them from being run concurrently. (Attempting to run the same JobInstance while another is already running results in a JobExecutionAlreadyRunningException being thrown). There should now be an extra entry in both the JobInstance and JobParameters tables and two extra entries in the JobExecution table, as shown in the following tables:

Table 5. BATCH_JOB_INSTANCE

JOB_INST_ID

JOB_NAME

1

EndOfDayJob

2

EndOfDayJob

Table 6. BATCH_JOB_EXECUTION_PARAMS

JOB_EXECUTION_ID

TYPE_CD

KEY_NAME

DATE_VAL

IDENTIFYING

1

DATE

schedule.Date

2017-01-01 00:00:00

TRUE

2

DATE

schedule.Date

2017-01-01 00:00:00

TRUE

3

DATE

schedule.Date

2017-01-02 00:00:00

TRUE

Table 7. BATCH_JOB_EXECUTION

JOB_EXEC_ID

JOB_INST_ID

START_TIME

END_TIME

STATUS

1

1

2017-01-01 21:00

2017-01-01 21:30

FAILED

2

1

2017-01-02 21:00

2017-01-02 21:30

COMPLETED

3

2

2017-01-02 21:31

2017-01-02 22:29

COMPLETED

列名称可能已缩写或删除,以提高清晰度和格式。

Column names may have been abbreviated or removed for the sake of clarity and formatting.

Step

“步骤”是一个领域对象,它封装了一个批处理作业的一个独立的、顺序的阶段。因此,每个“作业”完全由一个或多个步骤组成。“步骤”包含所有必要的信息来定义和控制实际的批处理。这是一个必然含糊的描述,因为任何给定“步骤”的内容由编写“作业”的开发者决定。“步骤”可以像开发者希望的那样简单或复杂。一个简单的“步骤”可能将数据从文件加载到数据库,不需要很少的代码(取决于所使用的实现)。一个更复杂的“步骤”可能应用复杂的业务规则作为处理的一部分。与“作业”一样,“步骤”有一个单独的 StepExecution,它与一个唯一的 JobExecution 相关联,如下面的图片所示:

A Step is a domain object that encapsulates an independent, sequential phase of a batch job. Therefore, every Job is composed entirely of one or more steps. A Step contains all of the information necessary to define and control the actual batch processing. This is a necessarily vague description because the contents of any given Step are at the discretion of the developer writing a Job. A Step can be as simple or complex as the developer desires. A simple Step might load data from a file into the database, requiring little or no code (depending upon the implementations used). A more complex Step may have complicated business rules that are applied as part of the processing. As with a Job, a Step has an individual StepExecution that correlates with a unique JobExecution, as the following image shows:

jobHeirarchyWithSteps
Figure 3. Job Hierarchy With Steps

StepExecution

StepExecution 表示执行一个“步骤”的一次尝试。每次运行一个“步骤”时都会创建一个新的 StepExecution,类似于 JobExecution。但是,如果一个步骤由于前面的步骤失败而未能执行,则不会持久化执行。只有当步骤实际启动时才会创建一个 StepExecution

A StepExecution represents a single attempt to execute a Step. A new StepExecution is created each time a Step is run, similar to JobExecution. However, if a step fails to execute because the step before it fails, no execution is persisted for it. A StepExecution is created only when its Step is actually started.

“步骤”执行由 StepExecution 类的对象表示。每个执行都包含对其对应的步骤和 JobExecution 以及事务相关数据的引用,例如提交和回滚计数以及开始和结束时间。此外,每个步骤执行都包含一个 ExecutionContext,其中包含开发者需要在批处理运行中持久化的任何数据,例如重启所需的统计信息或状态信息。下表列出了 StepExecution 的属性:

Step executions are represented by objects of the StepExecution class. Each execution contains a reference to its corresponding step and JobExecution and transaction-related data, such as commit and rollback counts and start and end times. Additionally, each step execution contains an ExecutionContext, which contains any data a developer needs to have persisted across batch runs, such as statistics or state information needed to restart. The following table lists the properties for StepExecution:

Table 8. StepExecution Properties

Property

Definition

Status

A BatchStatus object that indicates the status of the execution. While running, the status is BatchStatus.STARTED. If it fails, the status is BatchStatus.FAILED. If it finishes successfully, the status is BatchStatus.COMPLETED.

startTime

A java.time.LocalDateTime representing the current system time when the execution was started. This field is empty if the step has yet to start.

endTime

A java.time.LocalDateTime representing the current system time when the execution finished, regardless of whether or not it was successful. This field is empty if the step has yet to exit.

exitStatus

The ExitStatus indicating the result of the execution. It is most important, because it contains an exit code that is returned to the caller. See chapter 5 for more details. This field is empty if the job has yet to exit.

executionContext

The “property bag” containing any user data that needs to be persisted between executions.

readCount

The number of items that have been successfully read.

writeCount

The number of items that have been successfully written.

commitCount

The number of transactions that have been committed for this execution.

rollbackCount

The number of times the business transaction controlled by the Step has been rolled back.

readSkipCount

The number of times read has failed, resulting in a skipped item.

processSkipCount

The number of times process has failed, resulting in a skipped item.

filterCount

The number of items that have been “filtered” by the ItemProcessor.

writeSkipCount

The number of times write has failed, resulting in a skipped item.

ExecutionContext

ExecutionContext 表示由框架持久化和控制的一组键值对,为开发者提供了一个存储持久化状态的地方,该状态限定在 StepExecution 对象或 JobExecution 对象中。(对于熟悉 Quartz 的人来说,它非常类似于 JobDataMap。)最好的使用方法示例是方便重启。以平面文件输入为例,在处理单个行时,框架会定期在提交点持久化 ExecutionContext

An ExecutionContext represents a collection of key/value pairs that are persisted and controlled by the framework to give developers a place to store persistent state that is scoped to a StepExecution object or a JobExecution object. (For those familiar with Quartz, it is very similar to JobDataMap.) The best usage example is to facilitate restart. Using flat file input as an example, while processing individual lines, the framework periodically persists the ExecutionContext at commit points. Doing so lets the ItemReader store its state in case a fatal error occurs during the run or even if the power goes out. All that is needed is to put the current number of lines read into the context, as the following example shows, and the framework does the rest:

executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition());

以“作业”类型详解部分的“EndOfDay”示例为例,假定有一个加载文件到数据库的步骤“loadData”。在第一次运行失败后,元数据表将如下示例所示:

Using the EndOfDay example from the Job stereotypes section as an example, assume there is one step, loadData, that loads a file into the database. After the first failed run, the metadata tables would look like the following example:

Table 9. BATCH_JOB_INSTANCE

JOB_INST_ID

JOB_NAME

1

EndOfDayJob

Table 10. BATCH_JOB_EXECUTION_PARAMS

JOB_INST_ID

TYPE_CD

KEY_NAME

DATE_VAL

1

DATE

schedule.Date

2017-01-01

Table 11. BATCH_JOB_EXECUTION

JOB_EXEC_ID

JOB_INST_ID

START_TIME

END_TIME

STATUS

1

1

2017-01-01 21:00

2017-01-01 21:30

FAILED

Table 12. BATCH_STEP_EXECUTION

STEP_EXEC_ID

JOB_EXEC_ID

STEP_NAME

START_TIME

END_TIME

STATUS

1

1

loadData

2017-01-01 21:00

2017-01-01 21:30

FAILED

Table 13. BATCH_STEP_EXECUTION_CONTEXT

STEP_EXEC_ID

SHORT_CONTEXT

1

{piece.count=40321}

在前一个案例中,“Step”运行了 30 分钟,处理了 40,321 个“pieces”,这是此场景中文件中的行数。此值在每次提交前由框架更新,并且可以包含与“ExecutionContext”中条目相对应的多行。在提交前收到通知需要各种“StepListener”实现之一(或“ItemStream”),本指南的后面部分将详细讨论这些内容。与上一个示例一样,假定第二天重新启动“Job”。重新启动后,数据库中将重建“ExecutionContext”的上一次运行的值。当打开“ItemReader”时,它会检查上下文中是否有任何已存储状态,并由此处进行初始化,如以下示例所示:

In the preceding case, the Step ran for 30 minutes and processed 40,321 “pieces”, which would represent lines in a file in this scenario. This value is updated just before each commit by the framework and can contain multiple rows corresponding to entries within the ExecutionContext. Being notified before a commit requires one of the various StepListener implementations (or an ItemStream), which are discussed in more detail later in this guide. As with the previous example, it is assumed that the Job is restarted the next day. When it is restarted, the values from the ExecutionContext of the last run are reconstituted from the database. When the ItemReader is opened, it can check to see if it has any stored state in the context and initialize itself from there, as the following example shows:

if (executionContext.containsKey(getKey(LINES_READ_COUNT))) {
    log.debug("Initializing for restart. Restart data is: " + executionContext);

    long lineCount = executionContext.getLong(getKey(LINES_READ_COUNT));

    LineReader reader = getReader();

    Object record = "";
    while (reader.getPosition() < lineCount && record != null) {
        record = readLine();
    }
}

在这种情况下,在上述代码运行后,当前行是 40,322,从而让“Step”可以从中断处重新开始。您还可以使用“ExecutionContext”获取需要关于运行本身持久化的统计信息。例如,如果平面文件中包含跨多行存在的待处理订单,那么存储已处理订单数(与读取的行数完全不同)可能是必要的,以便可以在“Step”结束时通过邮件发送邮件,邮件正文显示已处理订单总数。框架会处理此项操作的存储,以正确的范围将其与单个“JobInstance”关联。以下应该注意的是,有必要很难弄清楚是否应该使用现有的“ExecutionContext”。例如,使用上面的“EndOfDay”示例时,当 01-01 运行第二次重新启动时,框架会识别出这是同一个“JobInstance”,并且在单个“Step”基础上,从数据库中提取“ExecutionContext”,并将它(作为“StepExecution”的一部分)传递给“Step”本身。另一方面,对于 01-02 运行,框架会识别出这是另一个实例,因此必须将空上下文传递给“Step”。框架会为开发人员做出许多此类确定,以确保在正确时间将状态提供给开发人员。还需要注意的是,在任何给定时间,每个“StepExecution”都存在一个唯一的“ExecutionContext”。“ExecutionContext”客户端应该小心,因为这会创建一个共享键空间。因此,在放入值时应该谨慎,以确保不会覆盖任何数据。但是,“Step”完全不存储上下文中任何数据,因此没有办法对框架产生不利影响。

In this case, after the preceding code runs, the current line is 40,322, letting the Step start again from where it left off. You can also use the ExecutionContext for statistics that need to be persisted about the run itself. For example, if a flat file contains orders for processing that exist across multiple lines, it may be necessary to store how many orders have been processed (which is much different from the number of lines read), so that an email can be sent at the end of the Step with the total number of orders processed in the body. The framework handles storing this for the developer, to correctly scope it with an individual JobInstance. It can be very difficult to know whether an existing ExecutionContext should be used or not. For example, using the EndOfDay example from above, when the 01-01 run starts again for the second time, the framework recognizes that it is the same JobInstance and on an individual Step basis, pulls the ExecutionContext out of the database, and hands it (as part of the StepExecution) to the Step itself. Conversely, for the 01-02 run, the framework recognizes that it is a different instance, so an empty context must be handed to the Step. There are many of these types of determinations that the framework makes for the developer, to ensure the state is given to them at the correct time. It is also important to note that exactly one ExecutionContext exists per StepExecution at any given time. Clients of the ExecutionContext should be careful, because this creates a shared keyspace. As a result, care should be taken when putting values in to ensure no data is overwritten. However, the Step stores absolutely no data in the context, so there is no way to adversely affect the framework.

请注意,“JobExecution”中至少有一个“ExecutionContext”,每个“StepExecution”也有一个。例如,考虑以下代码段:

Note that there is at least one ExecutionContext per JobExecution and one for every StepExecution. For example, consider the following code snippet:

ExecutionContext ecStep = stepExecution.getExecutionContext();
ExecutionContext ecJob = jobExecution.getExecutionContext();
//ecStep does not equal ecJob

如注释中所述,“ecStep”不等于“ecJob”。它们是两个不同的“ExecutionContext”。限定到“Step”的“ExecutionContext”在“Step”的每个提交点保存,限定到作业的“ExecutionContext”在每个“Step”执行之间保存。

As noted in the comment, ecStep does not equal ecJob. They are two different ExecutionContexts. The one scoped to the Step is saved at every commit point in the Step, whereas the one scoped to the Job is saved in between every Step execution.

ExecutionContext,所有非瞬态条目都必须 Serializable。执行上下文的正确序列化支撑着步骤和作业的重启能力。如果您使用不是原生可序列化的键或值,那么您必须采用量身定制的序列化方法。如果无法序列化执行上下文,可能会危及状态持久化进程,使无法正确恢复失败的作业。

In the ExecutionContext, all non-transient entries must be Serializable. Proper serialization of the execution context underpins the restart capability of steps and jobs. Should you use keys or values that are not natively serializable, you are required to employ a tailored serialization approach. Failing to serialize the execution context may jeopardize the state persistence process, making failed jobs impossible to recover properly.

JobRepository

“JobRepository”是之前提到的所有类型的持久性机制。它为“JobLauncher”、“Job”和“Step”实现提供 CRUD 操作。首次启动“Job”时,从存储库获取“JobExecution”。此外,在执行过程中,“StepExecution”和“JobExecution”实现通过传递给存储库而持久化。

JobRepository is the persistence mechanism for all of the stereotypes mentioned earlier. It provides CRUD operations for JobLauncher, Job, and Step implementations. When a Job is first launched, a JobExecution is obtained from the repository. Also, during the course of execution, StepExecution and JobExecution implementations are persisted by passing them to the repository.

Java

使用 Java 配置时,“@EnableBatchProcessing”注解提供“JobRepository”,作为自动配置的组件之一。

When using Java configuration, the @EnableBatchProcessing annotation provides a JobRepository as one of the components that is automatically configured.

XML

Spring Batch XML 命名空间支持通过“<job-repository>”标记配置“JobRepository”实例,如下示例所示:

The Spring Batch XML namespace provides support for configuring a JobRepository instance with the <job-repository> tag, as the following example shows:

<job-repository id="jobRepository"/>

JobLauncher

“JobLauncher”表示一个简单界面,用于使用给定的“JobParameters”集启动“Job”,如下示例所示:

JobLauncher represents a simple interface for launching a Job with a given set of JobParameters, as the following example shows:

public interface JobLauncher {

public JobExecution run(Job job, JobParameters jobParameters)
            throws JobExecutionAlreadyRunningException, JobRestartException,
                   JobInstanceAlreadyCompleteException, JobParametersInvalidException;
}

需要实现从“JobRepository”获取有效的“JobExecution”并执行“Job”。

It is expected that implementations obtain a valid JobExecution from the JobRepository and execute the Job.

ItemReader

`ItemReader`是一种抽象,它代表一次一个条目地检索 `Step`的输入。当 `ItemReader`用尽了它可以提供的所有条目,它通过返回 `null`来指示这一点。你可以在 Readers And Writers中找到有关 `ItemReader`界面及其各种实现的更多详细信息。

ItemReader is an abstraction that represents the retrieval of input for a Step, one item at a time. When the ItemReader has exhausted the items it can provide, it indicates this by returning null. You can find more details about the ItemReader interface and its various implementations in Readers And Writers.

ItemWriter

`ItemWriter`是一种抽象,它代表一次一个批次或块输出给 `Step`的输出。通常,`ItemWriter`并不知道它接下来应该接收的输入,只知道在其当前调用中传入的条目。你可以在 Readers And Writers中找到有关 `ItemWriter`界面及其各种实现的更多详细信息。

ItemWriter is an abstraction that represents the output of a Step, one batch or chunk of items at a time. Generally, an ItemWriter has no knowledge of the input it should receive next and knows only the item that was passed in its current invocation. You can find more details about the ItemWriter interface and its various implementations in Readers And Writers.

ItemProcessor

`ItemProcessor`是一种抽象,它代表条目的业务处理。在 `ItemReader`读取一个条目,`ItemWriter`写入一个条目的同时,`ItemProcessor`提供一个访问点来转换或应用其他业务处理。如果在处理条目时,确定该条目无效,返回 `null`表示不应写出该条目。你可以在 Readers And Writers中找到有关 `ItemProcessor`界面的更多详细信息。

ItemProcessor is an abstraction that represents the business processing of an item. While the ItemReader reads one item, and the ItemWriter writes one item, the ItemProcessor provides an access point to transform or apply other business processing. If, while processing the item, it is determined that the item is not valid, returning null indicates that the item should not be written out. You can find more details about the ItemProcessor interface in Readers And Writers.

Batch Namespace

许多先前列出的域概念需要在 Spring “ApplicationContext” 中配置。虽然可以将上述界面的实现用于标准 Bean 定义,但已提供了命名空间以便于配置,如下示例所示:

Many of the domain concepts listed previously need to be configured in a Spring ApplicationContext. While there are implementations of the interfaces above that you can use in a standard bean definition, a namespace has been provided for ease of configuration, as the following example shows:

<beans:beans xmlns="http://www.springframework.org/schema/batch"
xmlns:beans="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
   http://www.springframework.org/schema/beans
   https://www.springframework.org/schema/beans/spring-beans.xsd
   http://www.springframework.org/schema/batch
   https://www.springframework.org/schema/batch/spring-batch.xsd">

<job id="ioSampleJob">
    <step id="step1">
        <tasklet>
            <chunk reader="itemReader" writer="itemWriter" commit-interval="2"/>
        </tasklet>
    </step>
</job>

</beans:beans>

只要声明了批次命名空间,就可以使用它其中的任何元素。你可以在 Configuring and Running a Job中找到有关配置作业的更多信息。你可以在 Configuring a Step中找到有关配置 `Step`的更多信息。

As long as the batch namespace has been declared, any of its elements can be used. You can find more information on configuring a Job in Configuring and Running a Job . You can find more information on configuring a Step in Configuring a Step.