Xml 简明教程

XML - Encoding

Encoding 是将 unicode 字符转换为其等效二进制表示的过程。XML 处理器读取 XML 文档时,它会根据编码类型对文档进行编码。因此,我们需要在 XML 声明中指定编码类型。

Encoding is the process of converting unicode characters into their equivalent binary representation. When the XML processor reads an XML document, it encodes the document depending on the type of encoding. Hence, we need to specify the type of encoding in the XML declaration.

Encoding Types

主要有两种编码类型 −

There are mainly two types of encoding −

  1. UTF-8

  2. UTF-16

UTF 表示 UCS 转换格式,UCS 本身表示通用字符集。数字 8 或 16 指的是用于表示字符的位数。它们要么是 8(1 到 4 个字节),要么是 16(2 或 4 个字节)。对于不包含编码信息的文档,默认设置为 UTF-8。

UTF stands for UCS Transformation Format, and UCS itself means Universal Character Set. The number 8 or 16 refers to the number of bits used to represent a character. They are either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the documents without encoding information, UTF-8 is set by default.

Syntax

编码类型包含在 XML 文档的前言部分。UTF-8 编码的语法如下 −

Encoding type is included in the prolog section of the XML document. The syntax for UTF-8 encoding is as follows −

<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>

UTF-16 编码的语法如下 −

The syntax for UTF-16 encoding is as follows −

<?xml version = "1.0" encoding = "UTF-16" standalone = "no" ?>

Example

以下示例展示了编码声明 −

Following example shows the declaration of encoding −

<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<contact-info>
   <name>Tanmay Patil</name>
   <company>TutorialsPoint</company>
   <phone>(011) 123-4567</phone>
</contact-info>

在上面的示例中, encoding="UTF-8" 指定使用 8 位来表示字符。要表示 16 位字符,可以使用 UTF-16 编码。

In the above example encoding="UTF-8", specifies that 8-bits are used to represent the characters. To represent 16-bit characters, UTF-16 encoding can be used.

经 UTF-8 编码的 XML 文件往往比经 UTF-16 格式编码的文件小。

The XML files encoded with UTF-8 tend to be smaller in size than those encoded with UTF-16 format.