Avro 简明教程

AVRO - Schemas

Avro 是基于模式的序列化工具,接收模式作为输入。尽管有各种模式可用,但 Avro 遵循其定义模式的自己的标准。这些模式描述以下详细信息 −

Avro, being a schema-based serialization utility, accepts schemas as input. In spite of various schemas being available, Avro follows its own standards of defining schemas. These schemas describe the following details −

  1. type of file (record by default)

  2. location of record

  3. name of the record

  4. fields in the record with their corresponding data types

使用这些模式,你可以使用更少的空间将序列化值存储在二进制格式中。这些值存储没有任何元数据。

Using these schemas, you can store serialized values in binary format using less space. These values are stored without any metadata.

Creating Avro Schemas

Avro 模式是在 JavaScript 对象表示法 (JSON) 文档格式中创建的,这是一种轻量级的基于文本的数据交换格式。它可以用以下方式创建 −

The Avro schema is created in JavaScript Object Notation (JSON) document format, which is a lightweight text-based data interchange format. It is created in one of the following ways −

  1. A JSON string

  2. A JSON object

  3. A JSON array

Example - 下面的示例显示了一个模式,它在名称空间 Tutorialspoint 下定义了一个名为 Employee 的文档,该文档具有字段名称和年龄。

Example − The following example shows a schema, which defines a document, under the name space Tutorialspoint, with name Employee, having fields name and age.

{
   "type" : "record",
   "namespace" : "Tutorialspoint",
   "name" : "Employee",
   "fields" : [
      { "name" : "Name" , "type" : "string" },
      { "name" : "Age" , "type" : "int" }
   ]
}

在这个示例中,你可以观察到每条记录有四个字段 -

In this example, you can observe that there are four fields for each record −

  1. type − This field comes under the document as well as the under the field named fields. In case of document, it shows the type of the document, generally a record because there are multiple fields. When it is field, the type describes data type.

  2. namespace − This field describes the name of the namespace in which the object resides.

  3. name − This field comes under the document as well as the under the field named fields. In case of document, it describes the schema name. This schema name together with the namespace, uniquely identifies the schema within the store (Namespace.schema name). In the above example, the full name of the schema will be Tutorialspoint.Employee. In case of fields, it describes name of the field.

Primitive Data Types of Avro

Avro 模式既有原始数据类型,也有复杂数据类型。下表描述了 Avro 的 primitive data types -

Avro schema is having primitive data types as well as complex data types. The following table describes the primitive data types of Avro −

Data type

Description

null

Null is a type having no value.

int

32-bit signed integer.

long

64-bit signed integer.

float

single precision (32-bit) IEEE 754 floating-point number.

double

double precision (64-bit) IEEE 754 floating-point number.

bytes

sequence of 8-bit unsigned bytes.

string

Unicode character sequence.

Complex Data Types of Avro

除了原始数据类型之外,Avro 还提供六种复杂数据类型,即记录、枚举、数组、映射、联合和固定的。

Along with primitive data types, Avro provides six complex data types namely Records, Enums, Arrays, Maps, Unions, and Fixed.

Record

Avro 中的记录数据类型是多个属性的集合。它支持以下属性 -

A record data type in Avro is a collection of multiple attributes. It supports the following attributes −

  1. name − The value of this field holds the name of the record.

  2. namespace − The value of this field holds the name of the namespace where the object is stored.

  3. type − The value of this attribute holds either the type of the document (record) or the datatype of the field in the schema.

  4. fields − This field holds a JSON array, which have the list of all of the fields in the schema, each having name and the type attributes.

Example

下面给出一个记录的示例。

Given below is the example of a record.

{
" type " : "record",
" namespace " : "Tutorialspoint",
" name " : "Employee",
" fields " : [
 { "name" : " Name" , "type" : "string" },
 { "name" : "age" , "type" : "int" }
 ]
}

Enum

枚举是集合中项的列表,Avro 枚举支持以下属性 -

An enumeration is a list of items in a collection, Avro enumeration supports the following attributes −

  1. name − The value of this field holds the name of the enumeration.

  2. namespace − The value of this field contains the string that qualifies the name of the Enumeration.

  3. symbols − The value of this field holds the enum’s symbols as an array of names.

Example

以下是枚举的示例。

Given below is the example of an enumeration.

{
   "type" : "enum",
   "name" : "Numbers",
   "namespace": "data",
   "symbols" : [ "ONE", "TWO", "THREE", "FOUR" ]
}

Arrays

此数据类型定义具有单个属性项的数组字段。此项属性指定数组中项的类型。

This data type defines an array field having a single attribute items. This items attribute specifies the type of items in the array.

Example

{ " type " : " array ", " items " : " int " }

Maps

地图数据类型是键值对的数组,它以键值对组织数据。Avro 地图的键必须是字符串。地图的值保存地图内容的数据类型。

The map data type is an array of key-value pairs, it organizes data as key-value pairs. The key for an Avro map must be a string. The values of a map hold the data type of the content of map.

Example

{"type" : "map", "values" : "int"}

Unions

每当字段有一个或多个数据类型时,都会使用联合数据类型。它们表示为 JSON 数组。例如,如果一个字段可以是整数或 null,则联合表示为 ["int", "null"]。

A union datatype is used whenever the field has one or more datatypes. They are represented as JSON arrays. For example, if a field that could be either an int or null, then the union is represented as ["int", "null"].

Example

以下是使用联合的示例文档 −

Given below is an example document using unions −

{
   "type" : "record",
   "namespace" : "tutorialspoint",
   "name" : "empdetails ",
   "fields" :
   [
      { "name" : "experience", "type": ["int", "null"] }, { "name" : "age", "type": "int" }
   ]
}

Fixed

此数据类型用于声明定长字段,该字段可用于存储二进制数据。它具有字段名称和数据作为属性。名称包含字段的名称,大小包含字段的大小。

This data type is used to declare a fixed-sized field that can be used for storing binary data. It has field name and data as attributes. Name holds the name of the field, and size holds the size of the field.

Example

{ "type" : "fixed" , "name" : "bdata", "size" : 1048576}