Spring Batch 简明教程

Spring Batch - Architecture

以下是 Spring Batch 架构的示意图。如此图所示,此架构包含三个主要组件,即 Application, Batch CoreBatch Infrastructure

Following is the diagrammatic representation of the architecture of Spring Batch. As depicted in the figure, the architecture contains three main components namely, Application, Batch Core, and Batch Infrastructure.

architecture

Application − 此组件包含我们使用 Spring Batch 框架编写的全部作业和代码。

Application − This component contains all the jobs and the code we write using the Spring Batch framework.

Batch Core − 此组件包含控制和启动批处理作业所需的所有 API 类。

Batch Core − This component contains all the API classes that are needed to control and launch a Batch Job.

Batch Infrastructure − 此组件包含应用程序和批处理核心组件使用的读取器、编写器和服务。

Batch Infrastructure − This component contains the readers, writers, and services used by both application and Batch core components.

Components of Spring Batch

下图显示了 Spring Batch 的不同组件以及它们如何相互连接。

The following illustration shows the different components of Spring Batch and how they are connected with each other.

components

Job

在 Spring Batch 应用程序中,作业是要执行的批处理。它从开始到结束连续运行。此作业进一步细分为步骤(或作业包含步骤)。

In a Spring Batch application, a job is the batch process that is to be executed. It runs from start to finish without interruption. This job is further divided into steps (or a job contains steps).

我们将在 Spring Batch 中使用 XML 文件或 Java 类配置作业。以下是 Spring Batch 中对作业的 XML 配置。

We will configure a job in Spring Batch using an XML file or a Java class. Following is the XML configuration of a Job in Spring Batch.

<job id = "jobid">
   <step id = "step1" next = "step2"/>
   <step id = "step2" next = "step3"/>
   <step id = "step3"/>
</job>

批处理作业是在 <job></job> 标签中配置的。它有一个名为 id 的属性。在这些标签内,我们定义步骤的定义和顺序。

A Batch job is configured within the tags <job></job>. It has an attribute named id. Within these tags, we define the definition and ordering of the steps.

Restartable − 一般来说,当一个作业正在运行,并且我们尝试再次启动它,这被认为是 restart ,并且它将再次启动。为避免此问题,您需要将 restartable 值设置为 false ,如下所示。

Restartable − In general, when a job is running and we try to start it again that is considered as restart and it will be started again. To avoid this, you need to set the restartable value to false as shown below.

<job id = "jobid" restartable = "false" >

</job>

Step

一个 step 是一个作业的独立部分,其中包含定义和执行作业(其部分)所需的信息。

A step is an independent part of a job which contains the necessary information to define and execute the job (its part).

如该图所示,每一步都由一个 ItemReader、ItemProcessor(可选)和一个 ItemWriter 组成。 A job may contain one or more steps

As specified in the diagram, each step is composed of an ItemReader, ItemProcessor (optional) and an ItemWriter. A job may contain one or more steps.

Readers, Writers, and Processors

一个 item reader 从特定来源读取数据到一个 Spring Batch应用程序中,而 item writer 将数据从 Spring Batch 应用程序写入到特定目的地中。

An item reader reads data into a Spring Batch application from a particular source, whereas an item writer writes data from the Spring Batch application to a particular destination.

一个 Item processor 是一个包含处理从 Spring Batch 读入数据的处理器代码的类。如果应用程序读取 "n" 记录,那么处理器中的代码将被执行在每个记录上。

An Item processor is a class which contains the processing code which processes the data read into the spring batch. If the application reads "n" records, then the code in the processor will be executed on each record.

当未给定阅读器和编写器时,一个 tasklet 充当 SpringBatch 的处理器。它只处理一个单一任务。例如,如果我们正在编写一个作业,其中有一个简单的步骤,我们从 MySQL 数据库中读取数据并对其进行处理,然后将其写入文件(扁平文件),那么我们的步骤使用 −

When no reader and writer are given, a tasklet acts as a processor for SpringBatch. It processes only a single task. For example, if we are writing a job with a simple step in it where we read data from MySQL database and process it and write it to a file (flat), then our step uses −

  1. A reader which reads from MySQL database.

  2. A writer which writes to a flat file.

  3. A custom processor which processes the data as per our wish.

<job id = "helloWorldJob">
   <step id = "step1">
      <tasklet>
         <chunk reader = "mysqlReader" writer = "fileWriter"
            processor = "CustomitemProcessor" ></chunk>
      </tasklet>
   </step>
</ job>

Spring Batch 提供了一个 readerswriters 的长列表。使用这些预定义的类,我们能够为它们定义 Bean。我们将在以后的章节中详细讨论 readerswriters

Spring Batch provides a long list of readers and writers. Using these predefined classes, we can define beans for them. We will discuss readers and writers in greater detail in the coming chapters.

JobRepository

Spring Batch 中的作业存储库为 JobLauncher、Job 和 Step 实现提供创建、检索、更新和删除 (CRUD) 操作。我们将在一个 XML 文件中定义一个作业存储库,如下所示。

A Job repository in Spring Batch provides Create, Retrieve, Update, and Delete (CRUD) operations for the JobLauncher, Job, and Step implementations. We will define a job repository in an XML file as shown below.

<job-repository id = "jobRepository"/>

除了 id ,还有一些额外的选项(可选)。以下是带有所有选项及其默认值的作业存储库配置。

In addition to id, there are some more options (optional) available. Following is the configuration of job repository with all the options and their default values.

<job-repository id = "jobRepository"
   data-source = "dataSource"
   transaction-manager = "transactionManager"
   isolation-level-for-create = "SERIALIZABLE"
   table-prefix = "BATCH_"
   max-varchar-length = "1000"/>

In-Memory Repository − 如果您不想在数据库中保留 Spring Batch 的域对象,您可以按照以下所示配置 jobRepository 的内存版本。

In-Memory Repository − In case you don’t want to persist the domain objects of the Spring Batch in the database, you can configure the in-memory version of the jobRepository as shown below.

<bean id = "jobRepository"
   class = "org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean ">
   <property name = "transactionManager" ref = "transactionManager"/>
</bean>

JobLauncher

JobLauncher 是一个使用 given set of parameters 启动 Spring Batch 作业的接口。 SampleJoblauncher 是实现 JobLauncher 接口的类。以下是 JobLauncher 的配置。

JobLauncher is an interface which launces the Spring Batch job with the given set of parameters. SampleJoblauncher is the class which implements the JobLauncher interface. Following is the configuration of the JobLauncher.

<bean id = "jobLauncher"
   class = "org.springframework.batch.core.launch.support.SimpleJobLauncher">
   <property name = "jobRepository" ref = "jobRepository" />
</bean>

JobInstance

一个 JobInstance 表示一个作业的逻辑运行;它是在我们运行作业时创建的。每个作业实例通过作业的名称和在运行时传递给它的参数进行区分。

A JobInstance represents the logical run of a job; it is created when we run a job. Each job instance is differentiated by the name of the job and the parameters passed to it while running.

如果一个 JobInstance 执行失败,相同的 JobInstance 可以再次执行。因此,每个 JobInstance 可以有多个作业执行。

If a JobInstance execution fails, the same JobInstance can be executed again. Hence, each JobInstance can have multiple job executions.

JobExecution and StepExecution

JobExecution 和 StepExecution 是一个作业/步骤执行的表示。它们包含作业/步骤的运行信息,诸如开始时间(作业/步骤)、结束时间(作业/步骤)。

JobExecution and StepExecution are the representation of the execution of a job/step. They contain the run information of the job/step such as start time (of job/step), end time (of job/step).