Apache Nifi 简明教程
Apache NiFi - Introduction
Apache NiFi 是一个功能强大、易于使用且可靠的系统,用于在不同的系统之间处理和分发数据。它基于由 NSA 开发的 Niagara Files 技术,并在此后 8 年后捐赠给 Apache 软件基金会。它根据 Apache 许可证版本 2.0 发行,日期为 2004 年 1 月。Apache NiFi 的最新版本是 1.7.1。
Apache NiFi is a powerful, easy to use and reliable system to process and distribute data between disparate systems. It is based on Niagara Files technology developed by NSA and then after 8 years donated to Apache Software foundation. It is distributed under Apache License Version 2.0, January 2004. The latest version for Apache NiFi is 1.7.1.
Apache NiFi 是一個实时数据采集平台,可以传输和管理不同来源和目标系统之间的数据传输。它支持多种数据格式,如日志、地理位置数据、社交动态等。它还支持许多协议,如 SFTP、HDFS 和 KAFKA 等。对各种数据源和协议的支持使该平台在许多 IT 组织中广受欢迎。
Apache NiFi is a real time data ingestion platform, which can transfer and manage data transfer between different sources and destination systems. It supports a wide variety of data formats like logs, geo location data, social feeds, etc. It also supports many protocols like SFTP, HDFS, and KAFKA, etc. This support to wide variety of data sources and protocols making this platform popular in many IT organizations.
Apache NiFi- General Features
Apache NiFi 的一般特性如下:
The general features of Apache NiFi are as follows −
-
Apache NiFi provides a web-based user interface, which provides seamless experience between design, control, feedback, and monitoring.
-
It is highly configurable. This helps users with guaranteed delivery, low latency, high throughput, dynamic prioritization, back pressure and modify flows on runtime.
-
It also provides data provenance module to track and monitor data from the start to the end of the flow.
-
Developers can create their own custom processors and reporting tasks according to their needs.
-
NiFi also provides support to secure protocols like SSL, HTTPS, SSH and other encryptions.
-
It also supports user and role management and also can be configured with LDAP for authorization.
Apache NiFi -Key Concepts
Apache NiFi 的主要概念如下 −
The key concepts of Apache NiFi are as follows −
-
Process Group − It is a group of NiFi flows, which helps a userto manage and keep flows in hierarchical manner.
-
Flow − It is created connecting different processors to transfer and modify data if required from one data source or sources to another destination data sources.
-
Processor − A processor is a java module responsible for either fetching data from sourcing system or storing it in destination system. Other processors are also used to add attributes or change content in flowfile.
-
Flowfile − It is the basic usage of NiFi, which represents the single object of the data picked from source system in NiFi. NiFiprocessormakes changes to flowfile while it moves from the source processor to the destination. Different events like CREATE, CLONE, RECEIVE, etc. are performed on flowfile by different processors in a flow.
-
Event − Events represent the change in flowfile while traversing through a NiFi Flow. These events are tracked in data provenance.
-
Data provenance − It is a repository.It also has a UI, which enables users to check the information about a flowfile and helps in troubleshooting if any issues that arise during the processing of a flowfile.
Apache NiFi Advantages
-
Apache NiFi enables data fetching from remote machines by using SFTP and guarantees data lineage.
-
Apache NiFi supports clustering, so it can work on multiple nodes with same flow processing different data, which increase the performance of data processing.
-
It also provides security policies on user level, process group level and other modules too.
-
Its UI can also run on HTTPS, which makes the interaction of users with NiFi secure.
-
NiFi supports around 188 processors and a user can also create custom plugins to support a wide variety of data systems.
Apache NiFi Disadvantages
-
When node gets disconnected from NiFi cluster while a user is making any changes in it, then the flow.xml becomes invalid.Anode cannot connect back to the cluster unless admin manually copies flow.xml from the connected node.
-
Apache NiFi have state persistence issue in case of primary node switch, which sometimes makes processors not able to fetch data from sourcing systems.
Apache NiFi - Basic Concepts
Apache NiFi 由 Web 服务器、流控制器和处理器组成,在 Java 虚拟机上运行。它还有 3 个存储库流文件存储库、内容存储库和来源存储库,如下图所示。
Apache NiFi consist of a web server, flow controller and a processor, which runs on Java Virtual Machine. It also has 3 repositories Flowfile Repository, Content Repository, and Provenance Repository as shown in the figure below.
Flowfile Repository
此存储库存储通过 apache NiFi 的数据流的每个流文件的当前状态和属性。此存储库的默认位置是在 apache NiFi 的根目录中。可以通过更改名为“nifi.flowfile.repository.directory”的属性来更改此存储库的位置。
This repository stores the current state and attributes of every flowfile that goes through the data flows of apache NiFi. The default location of this repository is in the root directory of apache NiFi. The location of this repository can be changed by changing the property named "nifi.flowfile.repository.directory".
Content Repository
此存储库包含 NiFi 的所有流文件中出现的所有内容。它的默认目录也位于 NiFi 的根目录中,可以使用“org.apache.nifi.controller.repository.FileSystemRepository”属性更改此目录。此目录会在磁盘中使用大量空间,因此建议在安装磁盘中有足够的空间。
This repository contains all the content present in all the flowfiles of NiFi. Its default directory is also in the root directory of NiFi and it can be changed using "org.apache.nifi.controller.repository.FileSystemRepository" property. This directory uses large space in disk so it is advisable to have enough space in the installation disk.
Provenance Repository
存储库会跟踪并存储所有在 NiFi 中流动的流文件的所有事件。有两个源存储库 - volatile provenance repository (在该存储库中,所有源数据都会在重启后丢失)和 persistent provenance repository 。它的默认目录也在 NiFi 的根目录中,并且可以使用“org.apache.nifi.provenance.PersistentProvenanceRepository”和“org.apache.nifi.provenance.VolatileProvenanceRepositor”属性为相应的存储库更改它。
The repository tracks and stores all the events of all the flowfiles that flow in NiFi. There are two provenance repositories - volatile provenance repository (in this repository all the provenance data get lost after restart) and persistent provenance repository. Its default directory is also in the root directory of NiFi and it can be changed using "org.apache.nifi.provenance.PersistentProvenanceRepository" and "org.apache.nifi.provenance.VolatileProvenanceRepositor" property for the respective repositories.
Apache NiFi - Environment Setup
在本节中,我们将了解 Apache NiFi 的环境设置。Apache NiFi 的安装步骤如下 −
In this chapter, we will learn about the environment setup ofApache NiFi. The steps for installation of Apache NiFi are as follows −
Step 1 − 在您的计算机中安装当前版本的 Java。请在您的机器中设置 JAVA_HOME。您可以按照如下所示检查版本:
Step 1 − Install the current version of Java in your computer. Please set theJAVA_HOME in your machine. You can check the version as shown below:
在 Windows 操作系统 (OS)(使用命令提示符)中 −
In Windows Operating System (OS) (using command prompt) −
> java -version
在 UNIX OS (使用终端):
In UNIX OS (Using Terminal):
$ echo $JAVA_HOME
Step 2 − 从 https://nifi.apache.org/download.html 下载 Apache NiFi
Step 2 − DownloadApache NiFi from https://nifi.apache.org/download.html
-
For windows OSdownload ZIP file.
-
For UNIX OSdownload TAR file.
-
For docker images,go to the following link https://hub.docker.com/r/apache/nifi/.
Step 3 ——Apache NiFi 的安装过程非常简单。该过程因操作系统不同而异——
Step 3 − The installation process for Apache NiFi is very easy. The process differs with the OS −
-
Windows OS − Unzip the zip package and the Apache NiFi is installed.
-
UNIX OS − Extract tar file in any location and the Logstash is installed.
$tar -xvf nifi-1.6.0-bin.tar.gz
Step 4 ——打开命令提示符,转到 NiFi 的 bin 目录。例如,C:\nifi-1.7.1\bin,然后执行 run-nifi.bat 文件。
Step 4 − Open command prompt, go to the bin directory of NiFi. For example, C:\nifi-1.7.1\bin, and execute run-nifi.bat file.
C:\nifi-1.7.1\bin>run-nifi.bat
Step 5 ——NiFi UI 需要几分钟启动。用户可以在 NiFi UI 启动后检查 nifi-app.log,然后用户可以输入 http://localhost:8080/nifi/ 来访问 UI。
Step 5 − It will take a few minutes to get the NiFi UI up. A user cancheck nifi-app.log, once NiFi UI is up then, a user can enter http://localhost:8080/nifi/ to access UI.
Apache NiFi - User Interface
Apache 是用户可以通过 Web UI 访问的基于 Web 的平台。NiFi UI 非常具有交互性,并提供有关 NiFi 的广泛信息。如下面的图像所示,用户可以访问以下属性的信息——
Apache is a web-based platform that can be accessed by a user using web UI. The NiFi UI is very interactive and provides a wide variety of information about NiFi. As shown in the image below, a user can access information about the following attributes −
-
Active Threads
-
Total queued data
-
Transmitting Remote Process Groups
-
Not Transmitting Remote Process Groups
-
Running Components
-
Stopped Components
-
Invalid Components
-
Disabled Components
-
Up to date Versioned Process Groups
-
Locally modified Versioned Process Groups
-
Stale Versioned Process Groups
-
Locally modified and Stale Versioned Process Groups
-
Sync failure Versioned Process Groups
Components of Apache NiFi
Apache NiFi UI 包含以下组件——
Apache NiFi UI has the following components −
Processors
用户可以在画布上拖动进程图标,然后为 NiFi 中的数据流选择所需的处理器。
User can drag the process icon on the canvas and select the desired processor for the data flow in NiFi.
Input port
以下图标被拖到了画布上以将输入端口添加到任意数据流中。
Below icon is dragged to canvas to add the input port into any data flow.
输入端口用于获取不在该进程组中的处理器的数据。
Input port is used to get data from the processor, which is not present in that process group.
在拖动该图标后,NiFi 会要求输入输入端口的名称,然后将该名称添加到 NiFi 画布中。
After dragging this icon, NiFi asks to enter the name of the Input port and then it is added to the NiFi canvas.
Output port
以下图标被拖到了画布上以将输出端口添加到任意数据流中。
The below icon is dragged to canvas to add the output port into any data flow.
输出端口用于将数据传输到不在该进程组中的处理器。
The output port is used to transfer data to the processor, which is not present in that process group.
在拖动该图标后,NiFi 会要求输入输出端口的名称,然后将该名称添加到 NiFi 画布中。
After dragging this icon, NiFi asks to enter the name of the Output port and then it is added to the NiFi canvas.
Process Group
用户使用下面的图标在 NiFi 画布中添加进程组。
A user uses below icon to add process group in the NiFi canvas.
在拖动此图标后,NiFi 会要求输入进程组的名称,然后将其添加到 NiFi 画布中。
After dragging this icon, NiFi asks to enter the name of the Process Group and then it is added to the NiFi canvas.
Funnel
漏斗用于将某个处理程序的输出发送到多个处理程序。用户可以使用以下图标在 NiFi 数据流中添加漏斗。
Funnel is used to transfer the output of a processor to multiple processors. User can use the below icon to add the funnel in a NiFi data flow.
Apache NiFi - Processors
Apache NiFi 处理器是创建数据流的基本模块。每个处理器都有不同的功能,有助于创建输出流文件。下图所示的数据流正在使用 GetFile 处理器从一个目录中获取文件,并使用 PutFile 处理器将其存储在另一个目录中。
Apache NiFi processors are the basic blocks of creating a data flow. Every processor has different functionality, which contributes to the creation of output flowfile. Dataflow shown in the image below is fetching file from one directory using GetFile processor and storing it in another directory using PutFile processor.
GetFile
GetFile 流程用于从特定目录中获取特定格式的文件。它还为用户提供其他选项,以便更控制地进行获取。我们将在下面的属性部分讨论它。
GetFile process is used to fetch files of a specific format from a specific directory. It also provides other options to user for more control on fetching. We will discuss it in properties section below.
GetFile Settings
以下是 GetFile 处理器的不同设置:
Following are the different settings of GetFile processor −
Name
在名称设置中,用户可以根据项目为处理器定义任何名称,也可以根据更有意义的名称定义名称。
In the Name setting, a user can define any name for the processors either according to the project or by that, which makes the name more meaningful.
Penalty Duration
此设置允许用户在流程文件失败时添加处罚时间持续时间。
This setting lets a user to add the penalty time duration, in the event of flowfile failure.
Yield Duration
此设置用于指定处理器的让步时间。在这个持续时间内,该进程不会再次被安排。
This setting is used to specify the yield time for processor. In this duration, the process is not scheduled again.
Automatically Terminate Relationships
在此列出了该特定流程的所有可用关系检查。通过选中框,用户可以对处理器进行编程,以在该事件上终止流文件,并阻止将该文件进一步发送到流中。
This has a list of check of all the available relationship of that particular process. By checking the boxes, a user can program processor to terminate the flowfile on that event and do not send it further in the flow.
GetFile Scheduling
以下是由 GetFile 处理器提供的调度选项−
These are the following scheduling options offered by the GetFile processor −
Schedule Strategy
可以通过选择时间驱动或通过选择 CRON 驱动程序选项指定指定的 CRON 字符串,按时间基准调度流程。
You can either schedule the process on time basis by selecting time driven or a specified CRON string by selecting a CRON driver option.
Concurrent Tasks
此选项用于定义此处理器的并发任务调度。
This option is used to define the concurrent task schedule for this processor.
GetFile Properties
GetFile 提供多种属性,如下图所示,范围从强制性的属性(如输入目录和文件过滤器)到可选的属性(如路径过滤器和最大文件大小)。用户可以使用这些属性管理文件获取过程。
GetFile offers multiple properties as shown in the image below raging compulsory properties like Input directory and file filter to optional properties like Path Filter and Maximum file Size. A user can manage file fetching process using these properties.
PutFile
PutFile 处理器用于将数据流中的文件存储到特定位置。
The PutFile processor is used to store the file from the data flow to a specific location.
PutFile Settings
PutFile 处理器具有以下设置−
The PutFile processor has the following settings −
Name
在名称设置中,用户可以根据项目或使其名称更有意义来定义处理器的任何名称。
In the Name setting, a user can define any name for the processors either according to the project or by that which makes the name more meaningful.
Penalty Duration
此设置允许用户在流文件发生故障时添加惩罚时间。
This setting lets a user add the penalty time duration, in the event of flowfile failure.
Yield Duration
此设置用于指定处理器的等待时间。在此期间,该流程不会再次被调度。
This setting is used to specify the yield time for processor. In this duration, the process does not get scheduled again.
Automatically Terminate Relationships
此设置列出了该特定流程的所有可用关系检查。通过选中框,用户可以对处理器进行编程,以在该事件上终止流文件,并阻止将该文件进一步发送到流中。
This settings has a list of check of all the available relationship of that particular process. By checking the boxes, user can program processor to terminate the flowfile on that event and do not send it further in the flow.
PutFile Scheduling
以下是由 PutFile 处理器提供的调度选项−
These are the following scheduling options offered by the PutFile processor −
Schedule Strategy
可以通过选择定时器驱动或通过选择 CRON 驱动程序选项指定指定的 CRON 字符串,按时间基准调度流程。还有一种实验性策略事件驱动,它将在特定事件上触发处理器。
You can schedule the process on time basis either by selecting timer driven or a specified CRON string by selecting CRON driver option. There is also an Experimental strategy Event Driven, which will trigger the processor on a specific event.
Concurrent Tasks
此选项用于定义此处理器的并发任务调度。
This option is used to define the concurrent task schedule for this processor.
Apache NiFi - Processors Categorization
在本章中,我们将讨论 Apache NiFi 中的流程分类。
In this chapter, we will discuss process categorization in Apache NiFi.
Data Ingestion Processors
数据收集类别下的处理器用于将数据导入到 NiFi 数据流中。这些通常是 Apache NiFi 中任何数据流的起点。该类别下的某些处理器有 GetFile、GetHTTP、GetFTP、GetKAFKA 等。
The processors under Data Ingestion category are used to ingest data into the NiFi data flow. These are mainly the starting point of any data flow in apache NiFi. Some of the processors that belong to these categories are GetFile, GetHTTP, GetFTP, GetKAFKA, etc.
Routing and Mediation Processors
路由和中介处理器用于根据那些流文件的属性或内容中的信息将流文件路由到不同的处理器或数据流。这些处理器还负责控制 NiFi 数据流。该类别下的某些处理器有 RouteOnAttribute、RouteOnContent、ControlRate、RouteText 等。
Routing and Mediation processors are used to route the flowfiles to different processors or data flows according to the information in attributes or content of those flowfiles. These processors are also responsible to control the NiFi data flows. Some of the processors that belong to this category are RouteOnAttribute, RouteOnContent, ControlRate, RouteText, etc.
Database Access Processors
该数据库访问类别下的处理器能够从数据库中选择或插入数据,或者执行并准备其他 SQL 语句。这些处理器主要使用 Apache NiFi 的数据连接池控制器设置。该类别下的某些处理器有 ExecuteSQL、PutSQL、PutDatabaseRecord、ListDatabaseTables 等。
The processors of this Database Access category are capable of selecting or inserting data or executing and preparing other SQL statements from database. These processors mainly use data connection pool controller setting of Apache NiFi. Some of the processors that belong to this category are ExecuteSQL, PutSQL, PutDatabaseRecord, ListDatabaseTables, etc.
Attribute Extraction Processors
属性提取处理器负责在 NiFi 数据流中提取、分析、更改流文件属性的处理。该类别下的某些处理器有 UpdateAttribute、EvaluateJSONPath、ExtractText、AttributesToJSON 等。
Attribute Extraction Processors are responsible to extract, analyze, change flowfile attributes processing in the NiFi data flow. Some of the processors that belong to this category are UpdateAttribute, EvaluateJSONPath, ExtractText, AttributesToJSON, etc.
System Interaction Processors
系统交互处理器用于在任何操作系统中运行进程或命令。这些处理器还以多种语言运行脚本,以便与各种系统进行交互。该类别下的某些处理器有 ExecuteScript、ExecuteProcess、ExecuteGroovyScript、ExecuteStreamCommand 等。
System Interaction processors are used to run processes or commands in any operating system. These processors also run scripts in many languages to interact with a variety of systems. Some of the processors that belong to this category are ExecuteScript, ExecuteProcess, ExecuteGroovyScript, ExecuteStreamCommand, etc.
Data Transformation Processors
属于数据转换的处理器能够更改流文件的内容。当用户必须将流文件作为 HTTP 正文发送到 invokeHTTP 处理器调用时,这些处理器通常可以用于完全替换流文件的数据。该类别下的某些处理器有 ReplaceText、JoltTransformJSON 等。
Processors that belong to Data Transformation are capable of altering content of the flowfiles. These can be used to fully replace the data of a flowfile normally used when a user has to send flowfile as an HTTP body to invokeHTTP processor. Some of the processors that belong to this category are ReplaceText, JoltTransformJSON, etc.
Sending Data Processors
发送数据处理器通常是数据流中的最终处理器。这些处理器负责将数据存储或发送到目标服务器。在成功存储或发送数据后,这些处理器将中断与流文件的关系。该类别下的某些处理器有 PutEmail、PutKafka、PutSFTP、PutFile、PutFTP 等。
Sending Data Processors are generally the end processor in a data flow. These processors are responsible to store or send data to the destination server. After successful storing or sending the data, these processors DROP the flowfile with success relationship. Some of the processors that belong to this category are PutEmail, PutKafka, PutSFTP, PutFile, PutFTP, etc.
Splitting and Aggregation Processors
这些处理器用于分割和合并流文件中存在的内容。该类别下的某些处理器有 SplitText、SplitJson、SplitXml、MergeContent、SplitContent 等。
These processors are used to split and merge the content present in a flowfile. Some of the processors that belong to this category are SplitText, SplitJson, SplitXml, MergeContent, SplitContent, etc.
Apache NiFi - Processors Relationship
在 Apache NiFi 数据流中,流文件通过通过处理器之间的关系进行验证的连接从一个处理器移动到另一个处理器。每当建立连接时,开发人员都会选择这些处理器之间的一个人或多个关系。
In an Apache NiFi data flow, flowfiles move from one to another processor through connection that gets validated using a relationship between processors. Whenever a connection is created, a developer selects one or more relationships between those processors.
如上图所示,黑框中的复选框就是关系。如果开发人员选中这些复选框,则当关系成功、失败或同时成功和失败时,流文件将在该特定处理器中终止。
As you can see in the above image, the check boxes in black rectangle are relationships. If a developer selects these check boxes then, the flowfile will terminate in that particular processor, when the relationship is success or failure or both.
Success
当处理器成功处理流文件(如从任何数据源存储或获取数据)而没有出现任何连接、身份验证或任何其他错误时,流文件将转到成功关系。
When a processor successfully processes a flowfile like store or fetch data from any datasource without getting any connection, authentication or any other error, then the flowfile goes to success relationship.
Failure
当处理器无法在没有错误(如身份验证错误或连接问题等)的情况下处理流文件时,流文件将转到失败关系。
When a processor is not able to process a flowfile without errors like authentication error or connection problem, etc. then the flowfile goes to a failure relationship.
开发人员还可以使用连接将流文件传输到其他处理器。开发人员可以选择它并进行负载平衡,但负载平衡仅在 1.8 版中发布,本教程中不会介绍。
A developer can also transfer the flowfiles to other processors using connections. The developer can select and also load balance it, but load balancing is just released in version 1.8, which will not be covered in this tutorial.
如上图所示,标有红色的连接具有失败关系,这意味着所有带有错误的流文件将转到左侧的处理器,而所有没有错误的流文件将传输到标有绿色的连接。
As you can see in the above image the connection marked in red have failure relationship, which means all flowfiles with errors will go to the processor in left and respectively all the flowfiles without errors will be transferred to the connection marked in green.
现在我们继续介绍其他关系。
Let us now proceed with the other relationships.
comms.failure
当由于通信故障而无法从远程服务器获取流文件时,满足此关系。
This relationship is met, when a Flowfile could not be fetched from the remote server due to a communications failure.
Apache NiFi - FlowFile
FlowFile 是 Apache NiFi 中一个基本的处理实体。它包含数据内容和属性,这些内容和属性由 NiFi 处理器用于处理数据。文件内容通常包含从源系统获取的数据。Apache NiFi FlowFile 的最常见属性如下 -
A flowfile is a basic processing entity in Apache NiFi. It contains data contents and attributes, which are used by NiFi processors to process data. The file content normally contains the data fetched from source systems. The most common attributes of an Apache NiFi FlowFile are −
UUID
这是通用唯一标识符,它是由 NiFi 生成的 FlowFile 的唯一标识。
This stands for Universally Unique Identifier, which is a unique identity of a flowfile generated by NiFi.
Apache NiFi - Queues
Apache NiFi 数据流连接有一个队列系统,用来处理大量的数据流入。这些队列可以处理非常大量的 FlowFile,以便处理器按顺序对其进行处理。
The Apache NiFi data flow connection has a queuing system to handle the large amount of data inflow. These queues can handle very large amount of FlowFiles to let the processor process them serially.
上图中的队列有 1 个通过成功关系传送的 FlowFile。用户可以通过从下拉列表中选择 List queue 选项来检查 FlowFile。如果发生任何过载或错误,用户还可以通过选择 empty queue 选项清除队列,然后用户可以重新启动流程以再次在数据流中获取那些文件。
The queue in the above image has 1 flowfile transferred through success relationship. A user can check the flowfile by selecting the List queue option in the drop down list. In case of any overload or error, a user can also clear the queue by selecting the empty queue option and then the user can restart the flow to get those files again in the data flow.
队列中 FlowFile 的列表,包含位置、UUID、文件名、文件大小、队列持续时间和关系持续时间。用户可以通过单击 FlowFile 列表第一列中的信息图标来查看 FlowFile 的所有属性和内容。
The list of flowfiles in a queue, consist of position, UUID, Filename, File size, Queue Duration, and Lineage Duration. A user can see all the attributes and content of a flowfile by clicking the info icon present at the first column of the flowfile list.
Apache NiFi - Process Groups
在 Apache NiFi 中,用户可以在不同的流程组中维护不同的数据流。这些组可以基于不同的项目或 Apache NiFi 实例支持的组织。
In Apache NiFi, a user can maintain different data flows in different process groups. These groups can be based on different projects or the organizations, which Apache NiFi instance supports.
NiFi 用户界面顶部的菜单中的第四个符号(如上图所示)用于在 NiFi 画布中添加一个流程组。名为“Tutorialspoint.com_ProcessGroup”的流程组包含一个数据流,其中四个处理器当前处于停止阶段,如您在上图中看到的那样。可以分层方式创建流程组,以更好地组织管理流程数据,以便于理解。
The fourth symbol in the menu at the top of the NiFi UI as shown in the above picture is used to add a process group in the NiFi canvas. The process group named “Tutorialspoint.com_ProcessGroup” contains a data flow with four processors currently in stop stage as you can see in the above picture. Process groups can be created in hierarchical manner to manage the data flows in better structure, which is easy to understand.
在 NiFi UI 的页脚中,您可以查看流程组,还可以返回到用户当前所在的流程组的顶部。
In the footer of NiFi UI, you can see the process groups and can go back to the top of the process group a user is currently present in.
要查看 NiFi 中存在的流程组的完整列表,用户可以使用 NiFi UI 左上方的菜单转到摘要。在摘要中,有一个流程组选项卡,其中列出了所有流程组以及版本状态、已传输/大小、输入/大小、读取/写入、输出/大小等参数,如下图所示。
To see the full list of process groups present in NiFi, a user can go to the summary by using the menu present in the left top side of the NiFi UI. In summary, there is process groups tab where all the process groups are listed with parameters like Version State, Transferred/Size, In/Size, Read/Write, Out/Size, etc. as shown in the below picture.
Apache NiFi - Labels
Apache NiFi 提供标签允许开发者编写有关 NiFi 画布中存在的组件的信息。NiFi UI 顶部菜单最左边的图标用于在 NiFi 画布中添加标签。
Apache NiFi offers labels to enable a developer to write information about the components present in the NiFI canvas. The leftmost icon in the top menu of NiFi UI is used to add the label in NiFi canvas.
开发者可以通过右击标签并从菜单中选择适合的选项来改变标签的颜色和字体大小。
A developer can change the color of the label and the size of the text with a right-click on the label and choose the appropriate option from the menu.
Apache NiFi - Configuration
Apache NiFi 是一个高度可配置的平台。conf 目录中的 nifi.properties 文件
Apache NiFi is highly configurable platform. The nifi.properties file in conf directory
包含大部分配置。
contains most of the configuration.
Apache NiFi 常用的属性如下 −
The commonly used properties of Apache NiFi are as follows −
Core properties
此部分包含运行 NiFi 实例所需的属性。
This section contains the properties, which are compulsory to run a NiFi instance.
S.No. |
Property name |
Default Value |
description |
1 |
nifi.flow.configuration.file |
./conf/flow.xml.gz |
This property contains the path to flow.xml file. This file contains all the data flows created in NiFi. |
2 |
nifi.flow.configuration.archive.enabled |
true |
This property is used to enable or disable archiving in NiFi. |
3 |
nifi.flow.configuration.archive.dir |
./conf/archive/ |
This property is used to specify the archive directory. |
4 |
nifi.flow.configuration.archive.max.time |
30 days |
This is used to specify the retention time for archiving content. |
5 |
nifi.flow.configuration.archive.max.storage |
500 MB |
it contains the maximum size of archiving directory can grow. |
6 |
nifi.authorizer.configuration.file |
./conf/authorizers.xml |
To specify the authorizer configuration file, which is used for user authorization. |
7 |
nifi.login.identity.provider.configuration.file |
./conf/login-identity-providers.xml |
This property contains the configuration of login identity providers, |
8 |
nifi.templates.directory |
./conf/templates |
This property is used to specify the directory, where NiFi templates will be stored. |
9 |
nifi.nar.library.directory |
./lib |
This property contains the path to library, which NiFi will use to load all the components using NAR files present in this lib folder. |
10 |
nifi.nar.working.directory |
./work/nar/ |
This directory will be storing the unpacked nar files, once NiFi processes them. |
11 |
nifi.documentation.working.directory |
./work/docs/components |
This directory contains the documentation of all components. |
State Management
这些属性用于存储组件状态,这些状态有助于启动处理,即组件在重新启动后剩余的状态和在下次计划运行时的状态。
These properties are used to store the state of the components helpful to start the processing, where components left after a restart and in the next schedule running.
S.No. |
Property name |
Default Value |
description |
1 |
nifi.state.management.configuration.file |
./conf/state-management.xml |
This property contains the path to state-management.xml file. This file contains all component state present in the data flows of that NiFi instance. |
2 |
nifi.state.management.provider.local |
local-provider |
It contains the ID of the local state provider. |
3 |
nifi.state.management.provider.cluster |
zk-provider |
This property contains the ID of the cluster-wide state provider. This will be ignored if NiFi is not clustered but must be populated if running in a cluster. |
4 |
nifi.state.management. embedded. zookeeper. start |
false |
This property specifies whether or not this instance of NiFi should run an embedded ZooKeeper server. |
5 |
nifi.state.management. embedded. zookeeper.properties |
./conf/zookeeper.properties |
This property contains the path of the properties file that provides the ZooKeeper properties to use if <nifi.state.management. embedded. zookeeper. start> is set to true. |
FlowFile Repository
现在,我们来看一看 FlowFile 存储库的重要说明 –
Let us now look into the important details of the FlowFile repository −
S.No. |
Property name |
Default Value |
description |
1 |
nifi.flowfile.repository. implementation |
org.apache.nifi. controller. repository. WriteAhead FlowFileRepository |
This property is used to specify either to store the flowfiles in memory or disk. If a user want to stores the flowfiles in memory then change to "org.apache.nifi.controller. repository.VolatileFlowFileRepository". |
2 |
nifi.flowfile.repository.directory |
./flowfile_repository |
To specify the directory for flowfile repository. |
Apache NiFi - Administration
Apache NiFi 为 Ambari、Zookeeper 等多个工具提供支持,以用于管理目的。NiFi 还在 nifi.properties 文件中提供了配置,以便为管理员设置 HTTPS 和其他内容。
Apache NiFi offers support to multiple tools like ambari, zookeeper for administration purposes. NiFi also provides configuration in nifi.properties file to set up HTTPS and other things for administrators.
zookeeper
NiFi 本身不处理群集中的投票过程。这意味着创建群集时,所有节点都是主节点和协调器。因此,Zookeeper 被配置为管理主节点和协调器的投票。nifi.properties 文件包含一些用于设置 Zookeeper 的属性。
NiFi itself does not handle voting process in cluster. This means when a cluster is created, all the nodes are primary and coordinator. So, zookeeper is configured to manage the voting of primary node and coordinator. The nifi.properties file contains some properties to setup zookeeper.
S.No. |
Property name |
Default Value |
description |
1 |
nifi.state.management.embedded.zookeeper. properties |
./conf/zookeeper.properties |
To specify the path and name of zookeeper property file. |
2 |
nifi.zookeeper.connect.string |
empty |
To specify the connection string of zookeeper. |
3 |
nifi.zookeeper.connect.timeout |
3 secs |
To specify the connection timeout of zookeeper with NiFi. |
4 |
nifi.zookeeper.session.timeout |
3 secs |
To specify the session timeout of zookeeper with NiFi. |
5 |
nifi.zookeeper.root.node |
/nifi |
To specify root node for zookeeper. |
6 |
nifi.zookeeper.auth.type |
empty |
To specify authentication type for zookeeper. |
Enable HTTPS
如要通过 HTTPS 使用 NiFi,管理员必须生成密钥库和信任库,并在 nifi.properties 文件中设置一些属性。TLS 工具包可用于生成启用 Apache NiFi 中 HTTPS 所需的所有密钥。
To use NiFi over HTTPS, administrators have to generate keystore and truststore and set some properties in the nifi.properties file. The TLS toolkit can be used to generate all the necessary keys to enable HTTPS in apache NiFi.
S.No. |
Property name |
Default Value |
description |
1 |
nifi.web.https.port |
empty |
To specify https port number. |
2 |
nifi.web.https.network.interface.default |
empty |
Default interface for https in NiFi. |
3 |
nifi.security.keystore |
empty |
To specify the path and file name of keystore. |
4 |
nifi.security.keystoreType |
empty |
To specify the type of keystore type like JKS. |
5 |
nifi.security.keystorePasswd |
empty |
To specify keystore password. |
6 |
nifi.security.truststore |
empty |
To specify the path and file name of truststore. |
7 |
nifi.security.truststoreType |
empty |
To specify the type of truststore type like JKS. |
8 |
nifi.security.truststorePasswd |
empty |
To specify truststore password. |
Other properties for administration
管理员还使用其他一些属性来管理 NiFi 及其服务的连续性。
There are some other properties, which are used by administrators to manage the NiFi and for its service continuity.
S.No. |
Property name |
Default Value |
description |
1 |
nifi.flowcontroller.graceful.shutdown.period |
10 sec |
To specify the time to gracefully shutdown the NiFi flowcontroller. |
2 |
nifi.administrative.yield.duration |
30 sec |
To specify the administrative yield duration for NiFi. |
3 |
nifi.authorizer.configuration.file |
./conf/authorizers.xml |
To specify the path and file name of authorizer configuration file. |
4 |
nifi.login.identity.provider.configuration.file |
./conf/login-identity-providers.xml |
To specify the path and file name of login identity provider configuration file. |
Apache NiFi - Creating Flows
Apache NiFi 提供大量组件,帮助开发人员为任何类型的协议或数据源创建数据流。若要创建流,开发人员可将组件从菜单栏拖动至画布,然后单击并拖动鼠标,将组件彼此连接。
Apache NiFi offers a large number of components to help developers to create data flows for any type of protocols or data sources. To create a flow, a developer drags the components from menu bar to canvas and connects them by clicking and dragging the mouse from one component to other.
通常,NiFi 在流的开始部分有类似 getfile 的侦听器组件,用于从源系统获取数据。在另一端有类似 putfile 的发送器组件,以及处理数据的中间组件。
Generally, a NiFi has a listener component at the starting of the flow like getfile, which gets the data from source system. On the other end of there is a transmitter component like putfile and there are components in between, which process the data.
例如,让我们创建一个流,该流从一个目录中获取一个空文件,然后在该文件中添加一些文本,并将它放入另一个目录中。
For example, let create a flow, which takes an empty file from one directory and add some text in that file and put it in another directory.
-
To begin with, drag the processor icon to the NiFi canvas and select GetFile processor from the list.
-
Create an input directory like c:\inputdir.
-
Right-click on the processor and select configure and in properties tab add Input Directory (c:\inputdir) and click apply and go back to canvas.
-
Drag the processor icon to the canvas and select the ReplaceText processor from the list.
-
Right-click on the processor and select configure. In the properties tab, add some text like “Hello tutorialspoint.com” in the textbox of Replacement Value and click apply.
-
Go to settings tab, check the failure checkbox at right hand side, and then go back to the canvas.
-
Connect GetFIle processor to ReplaceText on success relationship.
-
Drag the processor icon to the canvas and select the PutFile processor from the list.
-
Create an output directory like c:\outputdir.
-
Right-click on the processor and select configure. In the properties tab, add Directory (c:\outputdir) and click apply and go back to canvas.
-
Go to settings tab and check the failure and success checkbox at right hand side and then go back to the canvas.
-
Connect the ReplaceText processor to PutFile on success relationship.
-
Now start the flow and add an empty file in input directory and you will see that, it will move to output directory and the text will be added to the file.
通过遵循以上步骤,开发人员可以选择任何处理器和其他 NiFi 组件,为他们的组织或客户创建合适的流程。
By following the above steps, developers can choose any processor and other NiFi component to create suitable flow for their organisation or client.
Apache NiFi - Templates
Apache NiFi 提供了模板的概念,可以更轻松地重用和分发 NiFi 流程。这些流程可由其他开发人员或在其他 NiFi 集群中使用。它还有助于 NiFi 开发人员在 GitHub 等存储库中共享他们的工作。
Apache NiFi offers the concept of Templates, which makes it easier to reuse and distribute the NiFi flows. The flows can be used by other developers or in other NiFi clusters. It also helps NiFi developers to share their work in repositories like GitHub.
Create Template
让我们为在章节 15 “Apache NiFi - 创建流程” 中创建的流程创建一个模板。
Let us create a template for the flow, which we created in chapter no 15 “Apache NiFi - Creating Flows”.
使用 Shift 键选择流程的所有组件,然后单击 NiFi 画布左侧的 “创建模板” 图标。您还可以看到一个工具箱,如下图所示。单击如下图中用蓝色标记的图标 create template 。输入模板的名称。开发人员还可以添加说明,这是可选的。
Select all the components of the flow using shift key and then click on the create template icon at the left hand side of the NiFi canvas. You can also see a tool box as shown in the above image. Click on the icon create template marked in blue as in the above picture. Enter the name for the template. A developer can also add description, which is optional.
Download Template
然后转到 NiFi UI 右上角的菜单中的 NiFi 模板选项,如下图所示。
Then go to the NiFi templates option in the menu present at the top right hand corner of NiFi UI as show in the picture below.
现在,单击您要下载的模板的下载图标(位于列表右侧)。将下载带有模板名称的 XML 文件。
Now click the download icon (present at the right hand side in the list) of the template, you want to download. An XML file with the template name will get downloaded.
Upload Template
要在 NiFi 中使用模板,开发人员必须使用 UI 将其 XML 文件上传到 NiFi。在 “创建模板” 图标旁边有一个 “上传模板” 图标(在下图中用蓝色标记),单击它并浏览 XML。
To use a template in NiFi, a developer will have to upload its xml file to NiFi using UI. There is an Upload Template icon (marked with blue in below image) beside Create Template icon click on that and browse the xml.
Add Template
在 NiFi UI 的顶部工具栏中,模板图标位于标签图标之前。该图标如下图中所示用蓝色标记。
In the top toolbar of NiFi UI, the template icon is before the label icon. The icon is marked in blue as shown in the picture below.
拖动模板图标,从下拉列表中选择模板,然后单击 “添加”。它将把模板添加到 NiFi 画布。
Drag the template icon and choose the template from the drop down list and click add. It will add the template to NiFi canvas.
Apache NiFi - API
NiFi 提供了许多 API,可帮助开发人员通过任何其他工具或自定义开发的应用程序来更改并获取 NiFi 的信息。在本教程中,我们将使用谷歌 Chrome 中的 postman 应用程序来讲解一些示例。
NiFi offers a large number of API, which helps developers to make changes and get information of NiFi from any other tool or custom developed applications. In this tutorial, we will use postman app in google chrome to explain some examples.
若要将 postman 添加到您的 Google Chrome,请访问下面提到的 URL,然后单击添加到 Chrome 按钮。您现在将看到一个添加到您 Google Chrome 的新应用程序。
To add postmantoyour Google Chrome, go to the below mentioned URL and click add to chrome button. You will now see a new app added toyour Google Chrome.
NiFi rest API 的当前版本是 1.8.0,文档位于下面提到的 URL 中。
The current version of NiFi rest API is 1.8.0 and the documentation is present in the below mentioned URL.
以下是使用最多的 NiFi rest API 模块:
Following are the most used NiFi rest API Modules −
-
[role="bare"]http://<nifi url>:<nifi port>/nifi-api/<*api-path*>
-
In case HTTPS is enabled [role="bare"]https://<nifi url>:<nifi port>/nifi-api/<*api-path*>
S.No. |
API module Name |
api-path |
Description |
1 |
Access |
/access |
To authenticate user and get access token from NiFi. |
2 |
Controller |
/controller |
To manage the cluster and create reporting task. |
3 |
Controller Services |
/controller-services |
It is used to manage controller services and update controller service references. |
4 |
Reporting Tasks |
/reporting-tasks |
To manage reporting tasks. |
5 |
Flow |
/flow |
To get the data flow metadata and component status and query history |
6 |
Process Groups |
/process-groups |
To upload and instantiate a template and create components. |
7 |
Processors |
/processors |
To create and schedule a processor and set its properties. |
8 |
Connections |
/connections |
To create a connection, set queue priority and update connection destination |
9 |
FlowFile Queues |
/flowfile-queues |
To view queue contents, download flowfile content, and empty queue. |
10 |
Remote Process Groups |
/remote-process-groups |
To create a remote group and enable transmission. |
11 |
Provenance |
/provenance |
To query provenance, and search event lineage. |
让我们现在考虑一个示例,并在 postman 上运行以获取有关运行的 NiFi 实例的详细信息。
Let us now consider an example and run on postman to get the details about the running NiFi instance.
Apache NiFi - Data Provenance
Apache NiFi 会记录和存储有关流中已摄取数据上发生的事件的每个信息。数据来源存储库会存储此信息,并提供界面来搜索此事件信息。既可以访问针对整个 NiFi 层级的数据来源,也可以访问针对处理程序层级的数据来源。
Apache NiFi logs and store every information about the events occur on the ingested data in the flow. Data provenance repository stores this information and provides UI to search this event information. Data provenance can be accessed for full NiFi level and processor level also.
下表列出了 NiFi 数据来源事件列表中各个字段,这些字段包括:
The following table lists down the different fields in the NiFi Data Provenance event list have following fields −
S.No. |
Field Name |
Description |
1 |
Date/Time |
Date and time of event. |
2 |
Type |
Type of Event like ‘CREATE’. |
3 |
FlowFileUuid |
UUID of the flowfile on which the event is performed. |
4 |
Size |
Size of the flowfile. |
5 |
Component Name |
Name of the component which performed the event. |
6 |
Component Type |
Type of the component. |
7 |
Show lineage |
Last column has the show lineage icon, which is used to see the flowfile lineage as shown in the below image. |
要获取有关该事件的更多信息,用户可以单击 NiFi Data Provenance UI 第一列中显示的信息图标。
To get more information about the event, a user can click on the information icon present in the first column of the NiFi Data Provenance UI.
nifi.properties 文件中有一些属性用于管理 NiFi Data Provenance 存储库。
There are some properties in nifi.properties file, which are used to manage NiFi Data Provenance repository.
S.No. |
Property Name |
Default Value |
Description |
1 |
nifi.provenance.repository.directory.default |
./provenance_repository |
To specify the default path of NiFi data provenance . |
2 |
nifi.provenance.repository.max.storage.time |
24 hours |
To specify the maximum retention time of NiFi data provenance. |
3 |
nifi.provenance.repository.max.storage.size |
1 GB |
To specify the maximum storage of NiFi data provenance. |
4 |
nifi.provenance.repository.rollover.time |
30 secs |
To specify the rollover time of NiFi data provenance. |
5 |
nifi.provenance.repository.rollover.size |
100 MB |
To specify the rollover size of NiFi data provenance. |
6 |
nifi.provenance.repository.indexed.fields |
EventType, FlowFileUUID, Filename, ProcessorID, Relationship |
To specify the fields used to search and index NiFi data provenance. |
Apache NiFi - Monitoring
在 Apache NiFi 中,有多种方式可以监控系统中的不同统计信息,例如错误、内存使用情况、CPU 使用情况、数据流统计信息等。我们将在本教程中讨论最流行的统计信息。
In Apache NiFi, there are multiple ways to monitor the different statistics of the system like errors, memory usage, CPU usage, Data Flow statistics, etc. We will discuss the most popular ones in this tutorial.
In built Monitoring
在本节中,我们将进一步了解 Apache NiFi 中内置的监控功能。
In this section, we will learn more about in built monitoring in Apache NiFi.
Bulletin Board
公告栏以实时方式显示 NiFi 处理器生成的最新 ERROR 和 WARNING。要访问公告栏,用户必须转到右边的下拉菜单,然后选择“公告栏”选项。它会自动刷新,用户还可以禁用它。用户还可以通过双击错误导航到实际的处理器。用户也可以通过以下方法过滤公告:
The bulletin board shows the latest ERROR and WARNING getting generated by NiFi processors in real time. To access the bulletin board, a user will have to go the right hand drop down menu and select the Bulletin Board option. It refreshes automatically and a user can disable it also. A user can also navigate to the actual processor by double-clicking the error. A user can also filter the bulletins by working out with the following −
-
by message
-
by name
-
by id
-
by group id
Data provenance UI
要监控在任何特定处理器或整个 NiFi 中发生的事件,用户可以从公告栏的同一菜单访问数据来源。用户还可以通过以下字段过滤数据来源存储库中的事件:
To monitor the Events occurring on any specific processor or throughout NiFi, a user can access the Data provenance from the same menu as the bulletin board. A user can also filter the events in data provenance repository by working out with the following fields −
-
by component name
-
by component type
-
by type
NiFi Summary UI
Apache NiFi 摘要也可以从公告栏的同一菜单访问。此 UI 包含有关该特定 NiFi 实例或集群的所有组件的信息。它们可以按名称、类型或 URI 进行筛选。不同的组件类型有不同的选项卡。以下是在 NiFi 摘要 UI 中可以监控的组件:
Apache NiFi summary also can be accessed from the same menu as the bulletin board. This UI contains information about all the components of that particular NiFi instance or cluster. They can be filtered by name, by type or by URI. There are different tabs for different component types. Following are the components, which can be monitored in the NiFi summary UI −
-
Processors
-
Input ports
-
Output ports
-
Remote process groups
-
Connections
-
Process groups
在此 UI 中,在右下角有一个名为系统诊断的链接,用于检查 JVM 统计信息。
In this UI, there is a link at the bottom right hand side named system diagnostics to check the JVM statistics.
Reporting Tasks
Apache NiFi 提供多项报告任务,以支持外部监控系统,如 Ambari、Grafana 等。开发人员可以创建一个自定义报告任务或配置内置的任务,以便将 NiFi 的指标发送到外部监控系统。下表列出了 NiFi 1.7.1 提供的报告任务。
Apache NiFi provides multiple reporting tasks to support external monitoring systems like Ambari, Grafana, etc. A developer can create a custom reporting task or can configure the inbuilt ones to send the metrics of NiFi to the externals monitoring systems. The following table lists down the reporting tasks offered by NiFi 1.7.1.
S.No. |
Reporting Task Name |
Description |
1 |
AmbariReportingTask |
To setup Ambari Metrics Service for NiFi. |
2 |
ControllerStatusReportingTask |
To report the information from the NiFi summary UI for the last 5 minute. |
3 |
MonitorDiskUsage |
To report and warn about the disk usage of a specific directory. |
4 |
MonitorMemory |
To monitor the amount of Java Heap used in a Java Memory pool of JVM. |
5 |
SiteToSiteBulletinReportingTask |
To report the errors and warning in bulletins using Site to Site protocol. |
6 |
SiteToSiteProvenanceReportingTask |
To report the NiFi Data Provenance events using Site to Site protocol. |
NiFi API
有一个名为系统诊断的 API,可用于在任何自定义开发应用程序中监视 NiFi 统计信息。让我们在 Postman 中检查 API。
There is an API named system diagnostics, which can be used to monitor the NiFI stats in any custom developed application. Let us check the API in postman.
Response
{
"systemDiagnostics": {
"aggregateSnapshot": {
"totalNonHeap": "183.89 MB",
"totalNonHeapBytes": 192819200,
"usedNonHeap": "173.47 MB",
"usedNonHeapBytes": 181894560,
"freeNonHeap": "10.42 MB",
"freeNonHeapBytes": 10924640,
"maxNonHeap": "-1 bytes",
"maxNonHeapBytes": -1,
"totalHeap": "512 MB",
"totalHeapBytes": 536870912,
"usedHeap": "273.37 MB",
"usedHeapBytes": 286652264,
"freeHeap": "238.63 MB",
"freeHeapBytes": 250218648,
"maxHeap": "512 MB",
"maxHeapBytes": 536870912,
"heapUtilization": "53.0%",
"availableProcessors": 4,
"processorLoadAverage": -1,
"totalThreads": 71,
"daemonThreads": 31,
"uptime": "17:30:35.277",
"flowFileRepositoryStorageUsage": {
"freeSpace": "286.93 GB",
"totalSpace": "464.78 GB",
"usedSpace": "177.85 GB",
"freeSpaceBytes": 308090789888,
"totalSpaceBytes": 499057160192,
"usedSpaceBytes": 190966370304,
"utilization": "38.0%"
},
"contentRepositoryStorageUsage": [
{
"identifier": "default",
"freeSpace": "286.93 GB",
"totalSpace": "464.78 GB",
"usedSpace": "177.85 GB",
"freeSpaceBytes": 308090789888,
"totalSpaceBytes": 499057160192,
"usedSpaceBytes": 190966370304,
"utilization": "38.0%"
}
],
"provenanceRepositoryStorageUsage": [
{
"identifier": "default",
"freeSpace": "286.93 GB",
"totalSpace": "464.78 GB",
"usedSpace": "177.85 GB",
"freeSpaceBytes": 308090789888,
"totalSpaceBytes": 499057160192,
"usedSpaceBytes": 190966370304,
"utilization": "38.0%"
}
],
"garbageCollection": [
{
"name": "G1 Young Generation",
"collectionCount": 344,
"collectionTime": "00:00:06.239",
"collectionMillis": 6239
},
{
"name": "G1 Old Generation",
"collectionCount": 0,
"collectionTime": "00:00:00.000",
"collectionMillis": 0
}
],
"statsLastRefreshed": "09:30:20 SGT",
"versionInfo": {
"niFiVersion": "1.7.1",
"javaVendor": "Oracle Corporation",
"javaVersion": "1.8.0_151",
"osName": "Windows 7",
"osVersion": "6.1",
"osArchitecture": "amd64",
"buildTag": "nifi-1.7.1-RC1",
"buildTimestamp": "07/12/2018 12:54:43 SGT"
}
}
}
}
Apache NiFi - Upgrade
在开始升级 Apache NiFi 之前,请阅读发行说明以了解更改和新增功能。用户需要评估这些新增和更改对其当前 NiFi 安装的影响。以下链接可获取 Apache NiFi 新版本的发布说明。
Before starting the upgrade of Apache NiFi, read the release notes to know about the changes and additions. A user needs to evaluate the impact of these additions and changes in his/her current NiFi installation. Below is the link to get the release notes for the new releases of Apache NiFi.
在群集设置中,用户需要升级群集中每个节点的 NiFi 安装。按照以下步骤升级 Apache NiFi。
In a cluster setup, a user needs to upgrade NiFi installation of every Node in a cluster. Follow the steps given below to upgrade the Apache NiFi.
-
Backup all the custom NARs present in your current NiFi or lib or any other folder.
-
Download the new version of Apache NiFi. Below is the link to download the source and binaries of latest NiFi version. https://nifi.apache.org/download.html
-
Create a new directory in the same installation directory of current NiFi and extract the new version of Apache NiFi.
-
Stop the NiFi gracefully. First stop all the processors and let all the flowfiles present in the flow get processed. Once, no more flowfile is there, stop the NiFi.
-
Copy the configuration of authorizers.xml from current NiFi installation to the new version.
-
Update the values in bootstrap-notification-services.xml, and bootstrap.conf of new NiFi version from the current one.
-
Add the custom logging from logback.xml to the new NiFi installation.
-
Configure the login identity provider in login-identity-providers.xml from the current version.
-
Update all the properties in nifi.properties of the new NiFi installation from current version.
-
Please make sure that the group and user of new version is same as the current version, to avoid any permission denied errors.
-
Copy the configuration from state-management.xml of current version to the new version.
-
Copy the contents of the following directories from current version of NiFi installation to the same directories in the new version. ./conf/flow.xml.gz Also flow.xml.gz from the archive directory. For provenance and content repositories change the values in nifi. properties file to the current repositories. copy state from ./state/local or change in nifi.properties if any other external directory is specified.
-
Recheck all the changes performed and check if they have an impact on any new changes added in the new NiFi version. If there is any impact, check for the solutions.
-
Start all the NiFi nodes and verify if all the flows are working correctly and repositories are storing data and Ui is retrieving it with any errors.
-
Monitor bulletins for some time to check for any new errors.
-
If the new version is working correctly, then the current version can be archived and deleted from the directories.
Apache NiFi - Remote Process Group
Apache NiFi 远程进程组或 RPG 允许流程使用站点到站点协议将流程中的 FlowFile 直接到不同的 NiFi 实例。在 1.7.1 版本中,NiFi 不提供平衡关系,因此 RPG 用于在 NiFi 数据流中实现负载均衡。
Apache NiFi Remote Process Group or RPG enables flow to direct the FlowFiles in a flow to different NiFi instances using Site-to-Site protocol. As of version 1.7.1, NiFi does not offer balanced relationships, so RPG is used for load balancing in a NiFi data flow.
开发者可以从 NiFi UI 的顶部工具栏添加 RPG,方法是将如图所示的图标拖到画布上。要配置 RPG,开发者必须添加以下字段:
A developer can add the RPG from the top toolbar of NiFi UI by dragging the icon as shown in the above picture to canvas. To configure an RPG, a Developer has to add the following fields −
S.No. |
Field Name |
Description |
1 |
URLs |
To specify comma separated remote target NiFi URLs. |
2 |
Transport Protocol |
To specify the transport protocol for remote NiFi instances. It’s either RAW or HTTP. |
3 |
Local Network Interface |
To specify the local network interface to send/receive data. |
4 |
HTTP Proxy Server Hostname |
To specify the proxy server’s hostname for the purpose of transport in RPG. |
5 |
HTTP Proxy Server Port |
To specify the proxy server’s port for the purpose of transport in RPG. |
6 |
HTTP Proxy User |
It is an optional field to specify the username for HTTP proxy. |
7 |
HTTP Proxy Password |
It is an optional field to specify the password for above username. |
开发者需要在像使用处理器之前启用它。
A developer needs to enable it, before using it like we start processors before using them.
Apache NiFi - Controller Settings
Apache NiFi 提供共享服务,这些服务可以由处理器共享,而报告任务称为控制器设置。这些服务类似于数据库连接池,可供访问同一数据库的处理器使用。
Apache NiFi offers shared services, which can be shared by processors and reporting task is called controller settings. These are like Database connection pool, which can be used by processors accessing same database.
要访问控制器设置,请使用 NiFi UI 右上角的下拉菜单,如下图所示。
To access the controller settings, use the drop down menu at the right top corner of NiFi UI as shown in the below image.
Apache NiFi 提供了许多控制器设置,我们将讨论一个常用的设置,以及如何将其设置为 NiFi。
There are many controller settings offered by Apache NiFi, we will discuss a commonly used one and how we set it up in NiFi.
DBCPConnectionPool
单击“控制器设置”选项后,在 Nifi 设置页面中添加加号。然后从控制器设置列表中选择 DBCPConnectionPool。DBCPConnectionPool 将会添加到主 NiFi 设置页面中,如下图所示。
Add the plus sign in the Nifi Settings page after clicking the Controller settings option. Then select the DBCPConnectionPool from the list of controller settings. DBCPConnectionPool will be added in the main NiFi settings page as shown in the below image.
它包含以下关于控制器 setting:Name 的信息
It contains the following information about the controller setting:Name
-
Type
-
Bundle
-
State
-
Scope
-
Configure and delete icon
单击配置图标并填写所需字段。下表列出了这些字段 -
Click on the configure icon and fill the required fields. The fields are listed down in the table below −
S.No. |
Field Name |
Default value |
description |
1 |
Database Connection URL |
empty |
To specify the connection URL to database. |
2 |
Database Driver Class Name |
empty |
To specify the driver class name for database like com.mysql.jdbc.Driver for mysql. |
3 |
Max Wait Time |
500 millis |
To specify time to wait for the data from a connection to database. |
4 |
Max Total Connections |
8 |
To specify the maximum number of allocated connection in database connection pool. |
要停止或配置控制器设置,首先应停止所有已连接的 NiFi 组件。NiFi 还在控制器设置中添加范围,以管理其配置。因此,只有共享相同设置的组件不会受到影响,并将使用相同的控制器设置。
To stop or configure a controller setting, first all the attached NiFi components should be stopped. NiFi also adds scope in controller settings to manage the configuration of it. Therefore, only the ones which shared the same settings will not get impacted and will use the same controller settings.
Apache NiFi - Reporting Task
Apache NiFi 报告任务类似于控制器服务,它们在后台运行并发送或记录 NiFi 实例的统计数据。还可以在与控制器设置相同的页面从 NiFi 报告任务进行访问,但位于不同的选项卡中。
Apache NiFi reporting tasks are similar to the controller services, which run in the background and send or log the statistics of NiFi instance. NiFi reporting task can also be accessed from the same page as controller settings, but in a different tab.
若要添加报告任务,开发人员需要单击报告任务页面右上角的加号按钮。这些报告任务主要用于监视 NiFi 实例的活动,无论是在公告中还是在来源中。主要而言,这些报告任务使用“站点对站点”将 NiFi 统计数据传输到其他节点或外部系统。
To add a reporting task, a developer needs to click on the plus button present at the top right hand side of the reporting tasks page. These reporting tasks are mainly used for monitoring the activities of a NiFi instance, in either the bulletins or the provenance. Mainly these reporting tasks uses Site-to-Site to transport the NiFi statistics data to other node or external system.
现在,让我们添加配置的报告任务以加深了解。
Let us now add a configured reporting task for more understanding.
MonitorMemory
当内存池超过指定百分比时,此报告任务会用于生成公告。按照以下步骤配置 MonitorMemory 报告任务:
This reporting task is used to generate bulletins, when a memory pool crosses specified percentage. Follow these steps to configure the MonitorMemory reporting task −
-
Add in the plus sign and search for MonitorMemory in the list.
-
Select MonitorMemory and click on ADD.
-
Once it is added in the main page of reporting tasks main page, click on the configure icon.
-
In the properties tab, select the memory pool, which you want to monitor.
-
Select the percentage after which you want bulletins to alert the users.
-
Start the reporting task.
Apache NiFi - Custom Processor
Apache NiFi 是一个开源平台,为开发者提供了在 NiFi 库中添加其定制处理器的选项。按照这些步骤创建自定义处理器。
Apache NiFi is an open source platform and gives developers the options to add their custom processor in the NiFi library. Follow these steps to create a custom processor.
-
Download Maven latest version from the link given below. https://maven.apache.org/download.cgi
-
Add an environment variable named M2_HOME and set value as the installation directory of maven.
-
Download Eclipse IDE from the below link. https://www.eclipse.org/downloads/download.php
-
Open command prompt and execute Maven Archetype command.
> mvn archetype:generate
-
Search for the nifi type in the archetype projects.
-
Select org.apache.nifi:nifi-processor-bundle-archetype project.
-
Then from the list of versions select the latest version i.e. 1.7.1 for this tutorial.
-
Enter the groupId, artifactId, version, package, and artifactBaseName etc.
-
Then a maven project will be created having to directories. nifi-<artifactBaseName>-processors nifi-<artifactBaseName>-nar
-
Run the below command in nifi-<artifactBaseName>-processors directory to add the project in the eclipse.
mvn install eclipse:eclipse
-
Open eclipse and select import from the file menu.
-
Then select “Existing Projects into workspace” and add the project from nifi-<artifactBaseName>-processors directory in eclipse.
-
Add your code in public void onTrigger(ProcessContext context, ProcessSession session) function, which runs when ever a processor is scheduled to run.
-
Then package the code to a NAR file by running the below mentioned command.
mvn clean install
-
A NAR file will be created at nifi- -nar/target directory.
-
Copy the NAR file to the lib folder of Apache NiFi and restart the NiFi.
-
After successful restart of NiFi, check the processor list for the new custom processor.
-
For any errors, check ./logs/nifi.log file.
Apache NiFi - Custom Controllers Service
Apache NiFi 是一个开源平台,它为开发人员提供了在 Apache NiFi 中添加自定义控制器服务的选项。该步骤和工具与用于创建自定义处理器的步骤和工具几乎相同。
Apache NiFi is an open source platform and gives developers the options to add their custom controllers service in Apache NiFi. The steps and tools are almost the same as used to create a custom processor.
-
Open command prompt and execute Maven Archetype command.
> mvn archetype:generate
-
Search for the nifi type in the archetype projects.
-
Select org.apache.nifi:nifi-service-bundle-archetype project.
-
Then from the list of versions, select the latest version – 1.7.1 for this tutorial.
-
Enter the groupId, artifactId, version, package, and artifactBaseName, etc.
-
A maven project will be created having directories. nifi-<artifactBaseName> nifi-<artifactBaseName>-nar nifi-<artifactBaseName>-api nifi-<artifactBaseName>-api-nar
-
Run the below command in nifi-<artifactBaseName> and nifi-<artifactBaseName>-api directories to add these two projects in the eclipse. mvn install eclipse:eclipse
-
Open eclipse and select import from the file menu.
-
Then select “Existing Projects into workspace” and add the project from nifi-<artifactBaseName> and nifi-<artifactBaseName>-api directories in eclipse.
-
Add your code in the source files.
-
Then package the code to a NAR file by running the below mentioned command. mvn clean install
-
Two NAR files will be created in each nifi-<artifactBaseName>/target and nifi-<artifactBaseName>-api/target directory.
-
Copy these NAR files to the lib folder of Apache NiFi and restart the NiFi.
-
After successful restart of NiFi, check the processor list for the new custom processor.
-
For any errors, check ./logs/nifi.log file.
Apache NiFi - Logging
Apache NiFi使用logback库来处理其日志记录。NiFi的conf目录中有一个文件logback.xml,用于配置NiFi中的日志记录。日志记录在NiFi的logs文件夹中生成,日志文件如下所述。
Apache NiFi uses logback library to handle its logging. There is a file logback.xml present in the conf directory of NiFi, which is used to configure the logging in NiFi. The logs are generated in logs folder of NiFi and the log files are as described below.
nifi-app.log
这是nifi的主要日志文件,记录了apache NiFi应用程序的所有活动,从NAR文件加载到NiFi组件遇到的运行时错误或公告。下面是 logback.xml 文件中 nifi-app.log 文件的默认附加程序。
This is the main log file of nifi, which logs all the activities of apache NiFi application ranging from NAR files loading to the run time errors or bulletins encountered by NiFi components. Below is the default appender in logback.xml file for nifi-app.log file.
<appender name="APP_FILE"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app.log</file>
<rollingPolicy
class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<fileNamePattern>
${org.apache.nifi.bootstrap.config.log.dir}/
nifi-app_%d{yyyy-MM-dd_HH}.%i.log
</fileNamePattern>
<maxFileSize>100MB</maxFileSize>
<maxHistory>30</maxHistory>
</rollingPolicy>
<immediateFlush>true</immediateFlush>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender>
附加程序名称为APP_FILE,类为RollingFileAppender,这意味着logger正在使用回滚策略。默认情况下,最大文件大小为100 MB,可更改为所需大小。APP_FILE的最大保留时间为30个日志文件,可以根据用户要求进行更改。
The appender name is APP_FILE, and the class is RollingFileAppender, which means logger is using rollback policy. By default, the max file size is 100 MB and can be changed to the required size. The maximum retention for APP_FILE is 30 log files and can be changed as per the user requirement.
nifi-user.log
此日志包含用户事件,如Web安全、Web API配置、用户授权等。下面是在logback.xml文件中nifi-user.log的附加程序。
This log contains the user events like web security, web api config, user authorization, etc. Below is the appender for nifi-user.log in logback.xml file.
<appender name="USER_FILE"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-user.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>
${org.apache.nifi.bootstrap.config.log.dir}/
nifi-user_%d.log
</fileNamePattern>
<maxHistory>30</maxHistory>
</rollingPolicy>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender>
附加程序名称为USER_FILE。它遵循轮换策略。USER_FILE的最大保留时间是30个日志文件。下面是nifi-user.log中存在的USER_FILE附加程序的默认记录器。
The appender name is USER_FILE. It follows the rollover policy. The maximum retention period for USER_FILE is 30 log files. Below is the default loggers for USER_FILE appender present in nifi-user.log.
<logger name="org.apache.nifi.web.security" level="INFO" additivity="false">
<appender-ref ref="USER_FILE"/>
</logger>
<logger name="org.apache.nifi.web.api.config" level="INFO" additivity="false">
<appender-ref ref="USER_FILE"/>
</logger>
<logger name="org.apache.nifi.authorization" level="INFO" additivity="false">
<appender-ref ref="USER_FILE"/>
</logger>
<logger name="org.apache.nifi.cluster.authorization" level="INFO" additivity="false">
<appender-ref ref="USER_FILE"/>
</logger>
<logger name="org.apache.nifi.web.filter.RequestLogger" level="INFO" additivity="false">
<appender-ref ref="USER_FILE"/>
</logger>
nifi-bootstrap.log
此日志包含自举日志、apache NiFi的标准输出(所有主要用于调试在代码中编写的system.out)和标准错误(所有在代码中编写的system.err)。下面是logback.log中nifi-bootstrap.log的默认附加程序。
This log contains the bootstrap logs, apache NiFi’s standard output (all system.out written in the code mainly for debugging), and standard error (all system.err written in the code). Below is the default appender for the nifi-bootstrap.log in logback.log.
<appender name="BOOTSTRAP_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-bootstrap.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>
${org.apache.nifi.bootstrap.config.log.dir}/nifi-bootstrap_%d.log
</fileNamePattern>
<maxHistory>5</maxHistory>
</rollingPolicy>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender>
nifi-bootstrap.log文件中的附加程序名称为BOOTSTRAP_FILE,它也遵循回滚策略。BOOTSTRAP_FILE附加程序的最大保留时间是5个日志文件。下面是nifi-bootstrap.log文件的默认记录器。
nifi-bootstrap.log file,s appender name is BOOTSTRAP_FILE, which also follows rollback policy. The maximum retention for BOOTSTRAP_FILE appender is 5 log files. Below is the default loggers for nifi-bootstrap.log file.
<logger name="org.apache.nifi.bootstrap" level="INFO" additivity="false">
<appender-ref ref="BOOTSTRAP_FILE" />
</logger>
<logger name="org.apache.nifi.bootstrap.Command" level="INFO" additivity="false">
<appender-ref ref="CONSOLE" />
<appender-ref ref="BOOTSTRAP_FILE" />
</logger>
<logger name="org.apache.nifi.StdOut" level="INFO" additivity="false">
<appender-ref ref="BOOTSTRAP_FILE" />
</logger>
<logger name="org.apache.nifi.StdErr" level="ERROR" additivity="false">
<appender-ref ref="BOOTSTRAP_FILE" />
</logger>