Apache Nifi 简明教程

Apache NiFi - Processors

Apache NiFi 处理器是创建数据流的基本模块。每个处理器都有不同的功能,有助于创建输出流文件。下图所示的数据流正在使用 GetFile 处理器从一个目录中获取文件,并使用 PutFile 处理器将其存储在另一个目录中。

Apache NiFi processors are the basic blocks of creating a data flow. Every processor has different functionality, which contributes to the creation of output flowfile. Dataflow shown in the image below is fetching file from one directory using GetFile processor and storing it in another directory using PutFile processor.

putfile processor

GetFile

GetFile 流程用于从特定目录中获取特定格式的文件。它还为用户提供其他选项,以便更控制地进行获取。我们将在下面的属性部分讨论它。

GetFile process is used to fetch files of a specific format from a specific directory. It also provides other options to user for more control on fetching. We will discuss it in properties section below.

getfile

GetFile Settings

以下是 GetFile 处理器的不同设置:

Following are the different settings of GetFile processor −

Name

在名称设置中,用户可以根据项目为处理器定义任何名称,也可以根据更有意义的名称定义名称。

In the Name setting, a user can define any name for the processors either according to the project or by that, which makes the name more meaningful.

Enable

用户可以使用此设置来启用或禁用处理器。

A user can enable or disable the processor using this setting.

Penalty Duration

此设置允许用户在流程文件失败时添加处罚时间持续时间。

This setting lets a user to add the penalty time duration, in the event of flowfile failure.

Yield Duration

此设置用于指定处理器的让步时间。在这个持续时间内,该进程不会再次被安排。

This setting is used to specify the yield time for processor. In this duration, the process is not scheduled again.

Bulletin Level

此设置用于指定该处理器的日志级别。

This setting is used to specify the log level of that processor.

Automatically Terminate Relationships

在此列出了该特定流程的所有可用关系检查。通过选中框,用户可以对处理器进行编程,以在该事件上终止流文件,并阻止将该文件进一步发送到流中。

This has a list of check of all the available relationship of that particular process. By checking the boxes, a user can program processor to terminate the flowfile on that event and do not send it further in the flow.

automatically terminate relationships

GetFile Scheduling

以下是由 GetFile 处理器提供的调度选项−

These are the following scheduling options offered by the GetFile processor −

Schedule Strategy

可以通过选择时间驱动或通过选择 CRON 驱动程序选项指定指定的 CRON 字符串,按时间基准调度流程。

You can either schedule the process on time basis by selecting time driven or a specified CRON string by selecting a CRON driver option.

Concurrent Tasks

此选项用于定义此处理器的并发任务调度。

This option is used to define the concurrent task schedule for this processor.

Execution

用户可以使用此选项定义是否在所有节点中运行处理器,还是仅在主节点中运行。

A user can define whether to run the processor in all nodes or only in Primary node by using this option.

Run Schedule

用于定义时间驱动策略的时间或 CRON 驱动策略的 CRON 表达式。

It is used to define the time for time driven strategy or CRON expression for CRON driven strategy.

run schedule

GetFile Properties

GetFile 提供多种属性,如下图所示,范围从强制性的属性(如输入目录和文件过滤器)到可选的属性(如路径过滤器和最大文件大小)。用户可以使用这些属性管理文件获取过程。

GetFile offers multiple properties as shown in the image below raging compulsory properties like Input directory and file filter to optional properties like Path Filter and Maximum file Size. A user can manage file fetching process using these properties.

getfile properties

GetFile Comments

本节用于指定有关处理器的任何信息。

This Section is used to specify any information about processor.

getfile comments

PutFile

PutFile 处理器用于将数据流中的文件存储到特定位置。

The PutFile processor is used to store the file from the data flow to a specific location.

putfile

PutFile Settings

PutFile 处理器具有以下设置−

The PutFile processor has the following settings −

Name

在名称设置中,用户可以根据项目或使其名称更有意义来定义处理器的任何名称。

In the Name setting, a user can define any name for the processors either according to the project or by that which makes the name more meaningful.

Enable

用户可以使用此设置来启用或禁用处理器。

A user can enable or disable the processor using this setting.

Penalty Duration

此设置允许用户在流文件发生故障时添加惩罚时间。

This setting lets a user add the penalty time duration, in the event of flowfile failure.

Yield Duration

此设置用于指定处理器的等待时间。在此期间,该流程不会再次被调度。

This setting is used to specify the yield time for processor. In this duration, the process does not get scheduled again.

Bulletin Level

此设置用于指定该处理器的日志级别。

This setting is used to specify the log level of that processor.

Automatically Terminate Relationships

此设置列出了该特定流程的所有可用关系检查。通过选中框,用户可以对处理器进行编程,以在该事件上终止流文件,并阻止将该文件进一步发送到流中。

This settings has a list of check of all the available relationship of that particular process. By checking the boxes, user can program processor to terminate the flowfile on that event and do not send it further in the flow.

automatically terminate

PutFile Scheduling

以下是由 PutFile 处理器提供的调度选项−

These are the following scheduling options offered by the PutFile processor −

Schedule Strategy

可以通过选择定时器驱动或通过选择 CRON 驱动程序选项指定指定的 CRON 字符串,按时间基准调度流程。还有一种实验性策略事件驱动,它将在特定事件上触发处理器。

You can schedule the process on time basis either by selecting timer driven or a specified CRON string by selecting CRON driver option. There is also an Experimental strategy Event Driven, which will trigger the processor on a specific event.

Concurrent Tasks

此选项用于定义此处理器的并发任务调度。

This option is used to define the concurrent task schedule for this processor.

Execution

用户可以使用此选项定义是否在所有节点中运行处理器,还是仅在主节点中运行。

A user can define whether to run the processor in all nodes or only in primary node by using this option.

Run Schedule

用于定义定时器驱动策略的时间或 CRON 驱动策略的 CRON 表达式。

It is used to define the time for timer driven strategy or CRON expression for CRON driven strategy.

putfile run schedule

PutFile Properties

PutFile 处理器提供了属性,例如目录,以指定文件传输的输出目录,以及其他属性以管理传输,如下所示。

The PutFile processor provides properties like Directory to specify the output directory for the purpose of file transfer and others to manage the transfer as shown in the image below.

putfile properties

PutFile Comments

本节用于指定有关处理器的任何信息。

This Section is used to specify any information about processor.

putfile comments