Avro 简明教程

AVRO - Deserialization Using Parsers

如前所述,可以通过生成与模式对应的类或使用解析器库将 Avro 模式读入程序。在 Avro 中,数据总是与它对应的模式存储在一起。因此,我们始终可以在不生成代码的情况下读取一个序列化项。

As mentioned earlier, one can read an Avro schema into a program either by generating a class corresponding to a schema or by using the parsers library. In Avro, data is always stored with its corresponding schema. Therefore, we can always read a serialized item without code generation.

本章描述了如何使用 Avro 读取模式 using parsers libraryDeserializing 数据。

This chapter describes how to read the schema using parsers library and Deserializing the data using Avro.

Deserialization Using Parsers Library

序列化数据存储在文件 mydata.txt 中。你可以使用 Avro 反序列化和读取它。

The serialized data is stored in the file mydata.txt. You can deserialize and read it using Avro.

avro utility

按照以下步骤从文件中序列化序列化数据。

Follow the procedure given below to deserialize the serialized data from a file.

Step 1

首先,从文件中读取模式。为此,请使用 Schema.Parser 类。此类提供以不同格式解析模式的方法。

First of all, read the schema from the file. To do so, use Schema.Parser class. This class provides methods to parse the schema in different formats.

通过传递存储模式的文件路径实例化 Schema.Parser 类。

Instantiate the Schema.Parser class by passing the file path where the schema is stored.

Schema schema = new Schema.Parser().parse(new File("/path/to/emp.avsc"));

Step 2

使用 SpecificDatumReader 类创建一个 DatumReader 接口对象。

Create an object of DatumReader interface using SpecificDatumReader class.

DatumReader<emp>empDatumReader = new SpecificDatumReader<emp>(emp.class);

Step 3

实例化 DataFileReader 类。此类从文件中读取序列化数据。它需要 DatumReader 对象和序列化数据存在的文件夹路径作为构造函数的参数。

Instantiate DataFileReader class. This class reads serialized data from a file. It requires the DatumReader object, and path of the file where the serialized data exists, as a parameters to the constructor.

DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(new File("/path/to/mydata.txt"), datumReader);

Step 4

使用 DataFileReader 的方法打印序列化数据。

Print the deserialized data, using the methods of DataFileReader.

  1. The hasNext() method returns a boolean if there are any elements in the Reader .

  2. The next() method of DataFileReader returns the data in the Reader.

while(dataFileReader.hasNext()){

   em=dataFileReader.next(em);
   System.out.println(em);
}

Example – Deserialization Using Parsers Library

以下完整程序展示了如何使用解析器库序列化序列化数据:

The following complete program shows how to deserialize the serialized data using Parsers library −

public class Deserialize {
   public static void main(String args[]) throws Exception{

      //Instantiating the Schema.Parser class.
      Schema schema = new Schema.Parser().parse(new File("/home/Hadoop/Avro/schema/emp.avsc"));
      DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);
      DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(new File("/home/Hadoop/Avro_Work/without_code_gen/mydata.txt"), datumReader);
      GenericRecord emp = null;

      while (dataFileReader.hasNext()) {
         emp = dataFileReader.next(emp);
         System.out.println(emp);
      }
      System.out.println("hello");
   }
}

浏览到生成代码所在目录。在本例中,它在 home/Hadoop/Avro_work/without_code_gen

Browse into the directory where the generated code is placed. In this case, it is at home/Hadoop/Avro_work/without_code_gen.

$ cd home/Hadoop/Avro_work/without_code_gen/

现在将上述程序复制并保存到名为 DeSerialize.java 的文件中。按照以下步骤进行编译和执行 -

Now copy and save the above program in the file named DeSerialize.java. Compile and execute it as shown below −

$ javac Deserialize.java
$ java Deserialize

Output

{"name": "ramu", "id": 1, "salary": 30000, "age": 25, "address": "chennai"}
{"name": "rahman", "id": 2, "salary": 35000, "age": 30, "address": "Delhi"}