Avro 简明教程

AVRO - Serialization By Generating Class

可以通过生成与架构图相对应的类或者使用解析器库将 Avro 架构图读入程序。本章介绍了如何使用 Avro 读取架构图 by generating a classSerializing 数据。

One can read an Avro schema into the program either by generating a class corresponding to a schema or by using the parsers library. This chapter describes how to read the schema by generating a class and Serializing the data using Avr.

avro withcode serialize

Serialization by Generating a Class

若要使用 Avro 序列化数据,请按照以下步骤操作:

To serialize the data using Avro, follow the steps as given below −

  1. Write an Avro schema.

  2. Compile the schema using Avro utility. You get the Java code corresponding to that schema.

  3. Populate the schema with the data.

  4. Serialize it using Avro library.

Defining a Schema

假设您想要一个具有以下详细信息的架构图:

Suppose you want a schema with the following details −

Field

Name

id

age

salary

address

type

String

int

int

int

string

创建如下所示的 Avro 架构图。

Create an Avro schema as shown below.

emp.avsc 的形式保存它。

Save it as emp.avsc.

{
   "namespace": "tutorialspoint.com",
   "type": "record",
   "name": "emp",
   "fields": [
      {"name": "name", "type": "string"},
      {"name": "id", "type": "int"},
      {"name": "salary", "type": "int"},
      {"name": "age", "type": "int"},
      {"name": "address", "type": "string"}
   ]
}

Compiling the Schema

在创建 Avro 架构图后,您需要使用 Avro 工具来编译所创建的架构图。 avro-tools-1.7.7.jar 是包含这些工具的 jar。

After creating an Avro schema, you need to compile the created schema using Avro tools. avro-tools-1.7.7.jar is the jar containing the tools.

Syntax to Compile an Avro Schema

java -jar <path/to/avro-tools-1.7.7.jar> compile schema <path/to/schema-file> <destination-folder>

在 home 文件夹中打开终端。

Open the terminal in the home folder.

创建一个新目录以使用 Avro,如下所示:

Create a new directory to work with Avro as shown below −

$ mkdir Avro_Work

在新建目录中,创建三个子目录−

In the newly created directory, create three sub-directories −

  1. First named schema, to place the schema.

  2. Second named with_code_gen, to place the generated code.

  3. Third named jars, to place the jar files.

$ mkdir schema
$ mkdir with_code_gen
$ mkdir jars

以下屏幕截图显示了在创建所有目录后您的 Avro_work 文件夹会是什么样子。

The following screenshot shows how your Avro_work folder should look like after creating all the directories.

avro work
  1. Now /home/Hadoop/Avro_work/jars/avro-tools-1.7.7.jar is the path for the directory where you have downloaded avro-tools-1.7.7.jar file.

  2. /home/Hadoop/Avro_work/schema/ is the path for the directory where your schema file emp.avsc is stored.

  3. /home/Hadoop/Avro_work/with_code_gen is the directory where you want the generated class files to be stored.

现在按如下所示编译模式 −

Now compile the schema as shown below −

$ java -jar /home/Hadoop/Avro_work/jars/avro-tools-1.7.7.jar compile schema /home/Hadoop/Avro_work/schema/emp.avsc /home/Hadoop/Avro/with_code_gen

编译后,根据模式的名称空间在目标目录中创建一个包。在此包中,将创建具有模式名称的 Java 源代码。此生成的源代码是给定模式的 Java 代码,可直接在应用程序中使用。

After compiling, a package according to the name space of the schema is created in the destination directory. Within this package, the Java source code with schema name is created. This generated source code is the Java code of the given schema which can be used in the applications directly.

例如,在此实例中创建了一个名为 tutorialspoint 的包/文件夹,其中包含另一个名为 com 的文件夹(因为名称空间是 tutorialspoint.com),在其中,您可以看到生成的文件 emp.java 。以下快照显示 emp.java

For example, in this instance a package/folder, named tutorialspoint is created which contains another folder named com (since the name space is tutorialspoint.com) and within it, you can observe the generated file emp.java. The following snapshot shows emp.java

snapshot sample program

此类对于根据模式创建数据十分有用。

This class is useful to create data according to schema.

生成的类包含−

The generated class contains −

  1. Default constructor, and parameterized constructor which accept all the variables of the schema.

  2. The setter and getter methods for all variables in the schema.

  3. Get() method which returns the schema.

  4. Builder methods.

Creating and Serializing the Data

首先,将此项目中使用的已生成 java 文件复制到当前目录中或从其所在位置导入该文件。

First of all, copy the generated java file used in this project into the current directory or import it from where it is located.

现在我们可以编写一个新的 Java 文件并实例化生成文件中 ( emp ) 的类以向模式中添加员工数据。

Now we can write a new Java file and instantiate the class in the generated file (emp) to add employee data to the schema.

让我们了解使用 Apache Avro 根据模式创建数据的过程。

Let us see the procedure to create data according to the schema using apache Avro.

Step 1

实例化生成的 emp 类。

Instantiate the generated emp class.

emp e1=new emp( );

Step 2

使用 setter 方法,插入第一个员工的数据。例如,我们已经创建的名为 Omar 的员工的详细信息。

Using setter methods, insert the data of first employee. For example, we have created the details of the employee named Omar.

e1.setName("omar");
e1.setAge(21);
e1.setSalary(30000);
e1.setAddress("Hyderabad");
e1.setId(001);

同样,使用 setter 方法填写所有员工详细信息。

Similarly, fill in all employee details using setter methods.

Step 3

使用 SpecificDatumWriter 类创建 DatumWriter 接口的对象。这会将 Java 对象转换为内存中的序列化格式。以下示例会为 emp 类实例化 SpecificDatumWriter 类对象。

Create an object of DatumWriter interface using the SpecificDatumWriter class. This converts Java objects into in-memory serialized format. The following example instantiates SpecificDatumWriter class object for emp class.

DatumWriter<emp> empDatumWriter = new SpecificDatumWriter<emp>(emp.class);

Step 4

emp 类实例化 DataFileWriter 。该类会连同模式本身将符合模式的数据序列序列化记录写入文件。该类需要 DatumWriter 对象作为构造函数的参数。

Instantiate DataFileWriter for emp class. This class writes a sequence serialized records of data conforming to a schema, along with the schema itself, in a file. This class requires the DatumWriter object, as a parameter to the constructor.

DataFileWriter<emp> empFileWriter = new DataFileWriter<emp>(empDatumWriter);

Step 5

使用 create() 方法打开一个新文件,以存储与给定模式匹配的数据。此方法需要模式和应存储数据的文件的路径作为参数。

Open a new file to store the data matching to the given schema using create() method. This method requires the schema, and the path of the file where the data is to be stored, as parameters.

在以下示例中,使用 getSchema() 方法传递模式,而数据文件存储在路径 /home/Hadoop/Avro/serialized_file/emp.avro. 中。

In the following example, schema is passed using getSchema() method, and the data file is stored in the path − /home/Hadoop/Avro/serialized_file/emp.avro.

empFileWriter.create(e1.getSchema(),new File("/home/Hadoop/Avro/serialized_file/emp.avro"));

Step 6

使用 append() 方法将所有创建的记录添加到文件中,如下所示:

Add all the created records to the file using append() method as shown below −

empFileWriter.append(e1);
empFileWriter.append(e2);
empFileWriter.append(e3);

Example – Serialization by Generating a Class

以下完整程序演示如何使用 Apache Avro 将数据序列化到文件中:

The following complete program shows how to serialize data into a file using Apache Avro −

import java.io.File;
import java.io.IOException;

import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.specific.SpecificDatumWriter;

public class Serialize {
   public static void main(String args[]) throws IOException{

      //Instantiating generated emp class
      emp e1=new emp();

      //Creating values according the schema
      e1.setName("omar");
      e1.setAge(21);
      e1.setSalary(30000);
      e1.setAddress("Hyderabad");
      e1.setId(001);

      emp e2=new emp();

      e2.setName("ram");
      e2.setAge(30);
      e2.setSalary(40000);
      e2.setAddress("Hyderabad");
      e2.setId(002);

      emp e3=new emp();

      e3.setName("robbin");
      e3.setAge(25);
      e3.setSalary(35000);
      e3.setAddress("Hyderabad");
      e3.setId(003);

      //Instantiate DatumWriter class
      DatumWriter<emp> empDatumWriter = new SpecificDatumWriter<emp>(emp.class);
      DataFileWriter<emp> empFileWriter = new DataFileWriter<emp>(empDatumWriter);

      empFileWriter.create(e1.getSchema(), new File("/home/Hadoop/Avro_Work/with_code_gen/emp.avro"));

      empFileWriter.append(e1);
      empFileWriter.append(e2);
      empFileWriter.append(e3);

      empFileWriter.close();

      System.out.println("data successfully serialized");
   }
}

浏览放置生成代码的目录。在此情况下,在 home/Hadoop/Avro_work/with_code_gen 中。

Browse through the directory where the generated code is placed. In this case, at home/Hadoop/Avro_work/with_code_gen.

In Terminal −

In Terminal −

$ cd home/Hadoop/Avro_work/with_code_gen/

In GUI −

In GUI −

generated code

现在将上述程序复制并保存到名为 Serialize.java 的文件中。

Now copy and save the above program in the file named Serialize.java

按如下所示进行编译并执行:

Compile and execute it as shown below −

$ javac Serialize.java
$ java Serialize

Output

data successfully serialized

如果你验证程序中给定的路径,你可以找到如下所示的生成序列化文件。

If you verify the path given in the program, you can find the generated serialized file as shown below.

generated serialized file