Xml 简明教程
XML - Overview
XML 代表 *E*xtensible *M*arkup *L*anguage。它是一种基于文本的标记语言,源自标准通用标记语言 (SGML)。
XML stands for *E*xtensible *M*arkup *L*anguage. It is a text-based markup language derived from Standard Generalized Markup Language (SGML).
XML 标签标识数据,用于存储和组织数据,而不是像用于显示数据的 HTML 标签那样指定如何显示数据。XML 在可预见的未来不会取代 HTML,但它通过采用 HTML 的许多成功功能引入了新的可能性。
XML tags identify the data and are used to store and organize the data, rather than specifying how to display it like HTML tags, which are used to display the data. XML is not going to replace HTML in the near future, but it introduces new possibilities by adopting many successful features of HTML.
XML 有三个重要的特性,使其在各种系统和解决方案中很有用:
There are three important characteristics of XML that make it useful in a variety of systems and solutions −
-
XML is extensible − XML allows you to create your own self-descriptive tags, or language, that suits your application.
-
XML carries the data, does not present it − XML allows you to store the data irrespective of how it will be presented.
-
XML is a public standard − XML was developed by an organization called the World Wide Web Consortium (W3C) and is available as an open standard.
XML Usage
XML 用途的简短列表说明了一切:
A short list of XML usage says it all −
-
XML can work behind the scene to simplify the creation of HTML documents for large web sites.
-
XML can be used to exchange the information between organizations and systems.
-
XML can be used for offloading and reloading of databases.
-
XML can be used to store and arrange the data, which can customize your data handling needs.
-
XML can easily be merged with style sheets to create almost any desired output.
-
Virtually, any type of data can be expressed as an XML document.
What is Markup?
XML 是一种标记语言,它为以人类可读和机器可读格式编码文档定义了一组规则。那么什么是标记语言呢?标记是添加到文档中的信息,通过识别各个部分及其相互关系,可以以某种方式增强其含义。更具体地说,标记语言是一组符号,可以放在文档文本中,以划分和标记文档的部分。
XML is a markup language that defines set of rules for encoding documents in a format that is both human-readable and machine-readable. So what exactly is a markup language? Markup is information added to a document that enhances its meaning in certain ways, in that it identifies the parts and how they relate to each other. More specifically, a markup language is a set of symbols that can be placed in the text of a document to demarcate and label the parts of that document.
以下示例展示了当 XML 标记嵌入在文本片段中时的外观 −
Following example shows how XML markup looks, when embedded in a piece of text −
<message>
<text>Hello, world!</text>
</message>
此代码段包括标记符号,或标记,如 <message>…</message> 和 <text>… </text>。标记 <message> 和 </message> 标记 XML 代码片段的开始和结束。标记 <text> 和 </text> 环绕文本 Hello, world!。
This snippet includes the markup symbols, or the tags such as <message>…</message> and <text>… </text>. The tags <message> and </message> mark the start and the end of the XML code fragment. The tags <text> and </text> surround the text Hello, world!.
Is XML a Programming Language?
编程语言由语法规则和自己的词汇组成,用于创建计算机程序。这些程序指示计算机执行特定任务。XML 没有资格成为编程语言,因为它不执行任何计算或算法。它通常存储在简单的文本文件中,并且由能够解释 XML 的特殊软件进行处理。
A programming language consists of grammar rules and its own vocabulary which is used to create computer programs. These programs instruct the computer to perform specific tasks. XML does not qualify to be a programming language as it does not perform any computation or algorithms. It is usually stored in a simple text file and is processed by special software that is capable of interpreting XML.
XML - Syntax
在本章节中,我们将讨论编写 XML 文档的简单语法规则。以下是一个完整的 XML 文档 −
In this chapter, we will discuss the simple syntax rules to write an XML document. Following is a complete XML document −
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
您会注意到以上示例中有两种信息 −
You can notice there are two kinds of information in the above example −
-
Markup, like <contact-info>
-
The text, or the character data, Tutorials Point and (040) 123-4567.
下图描述了在 XML 文档以撰写不同类型的标记和文本的语法规则。
The following diagram depicts the syntax rules to write different types of markup and text in an XML document.
让我们详细了解上图的每个组件。
Let us see each component of the above diagram in detail.
XML Declaration
XML 文档可以选择使用 XML 声明。它的撰写方式如下 −
The XML document can optionally have an XML declaration. It is written as follows −
<?xml version = "1.0" encoding = "UTF-8"?>
其中版本为 XML 版本,编码指定文件中使用的字符编码。
Where version is the XML version and encoding specifies the character encoding used in the document.
Syntax Rules for XML Declaration
-
The XML declaration is case sensitive and must begin with "<?xml>" where "xml" is written in lower-case.
-
If document contains XML declaration, then it strictly needs to be the first statement of the XML document.
-
The XML declaration strictly needs be the first statement in the XML document.
-
An HTTP protocol can override the value of encoding that you put in the XML declaration.
Tags and Elements
XML 文件由多个 XML 元素构成,也称为 XML 节点或 XML 标记。XML 元素的名称用尖括号 < > 括起来,如下所示 −
An XML file is structured by several XML-elements, also called XML-nodes or XML-tags. The names of XML-elements are enclosed in triangular brackets < > as shown below −
<element>
Syntax Rules for Tags and Elements
Element Syntax − 每个 XML 元素都需要使用起始元素或结束元素关闭,如下所示 −
Element Syntax − Each XML-element needs to be closed either with start or with end elements as shown below −
<element>....</element>
或在简单的情况下,只需这样 −
or in simple-cases, just this way −
<element/>
Nesting of Elements − 一个 XML 元素可以包含多个 XML 元素作为其子元素,但子元素不能重叠。即,元素的结束标记必须和最近未匹配的起始标记同名。
Nesting of Elements − An XML-element can contain multiple XML-elements as its children, but the children elements must not overlap. i.e., an end tag of an element must have the same name as that of the most recent unmatched start tag.
以下示例显示了错误的嵌套标记 −
The Following example shows incorrect nested tags −
<?xml version = "1.0"?>
<contact-info>
<company>TutorialsPoint
</contact-info>
</company>
以下示例显示了正确的嵌套标记 −
The Following example shows correct nested tags −
<?xml version = "1.0"?>
<contact-info>
<company>TutorialsPoint</company>
<contact-info>
Root Element − 一个 XML 文档只能有一个根元素。例如,以下不是正确的 XML 文档,因为 x 和 y 元素都出现在顶层,没有根元素 −
Root Element − An XML document can have only one root element. For example, following is not a correct XML document, because both the x and y elements occur at the top level without a root element −
<x>...</x>
<y>...</y>
以下示例显示了格式正确的 XML 文档 −
The Following example shows a correctly formed XML document −
<root>
<x>...</x>
<y>...</y>
</root>
Case Sensitivity − XML 元素名称区分大小写。这意味着起始元素和结束元素的名称需要完全相同。
Case Sensitivity − The names of XML-elements are case-sensitive. That means the name of the start and the end elements need to be exactly in the same case.
例如, <contact-info> 不同于 <Contact-Info>
For example, <contact-info> is different from <Contact-Info>
XML Attributes
attribute 为元素指定单个属性,使用名称/值对。XML 元素可以有一个或多个属性。例如 −
An attribute specifies a single property for the element, using a name/value pair. An XML-element can have one or more attributes. For example −
<a href = "http://www.tutorialspoint.com/">Tutorialspoint!</a>
此处 href 是属性名称, http://www.tutorialspoint.com/ 是属性值。
Here href is the attribute name and http://www.tutorialspoint.com/ is attribute value.
Syntax Rules for XML Attributes
-
Attribute names in XML (unlike HTML) are case sensitive. That is, HREF and href are considered two different XML attributes.
-
Same attribute cannot have two values in a syntax. The following example shows incorrect syntax because the attribute b is specified twice −
<a b = "x" c = "y" b = "z">....</a>
-
Attribute names are defined without quotation marks, whereas attribute values must always appear in quotation marks. Following example demonstrates incorrect xml syntax −
<a b = x>....</a>
在上述语法中,属性值没有用引号定义。
In the above syntax, the attribute value is not defined in quotation marks.
XML References
引用通常允许您在 XML 文档中添加或包含其他文本或标记。引用始终以 "&" 符号(这是一个保留字符)开头,以 ";". 符号结束,XML 有两种类型的引用−
References usually allow you to add or include additional text or markup in an XML document. References always begin with the symbol "&" which is a reserved character and end with the symbol ";". XML has two types of references −
-
Entity References − An entity reference contains a name between the start and the end delimiters. For example & where amp is name. The name refers to a predefined string of text and/or markup.
-
Character References − These contain references, such as A, contains a hash mark (“#”) followed by a number. The number always refers to the Unicode code of a character. In this case, 65 refers to alphabet "A".
XML Text
XML 元素和 XML 属性的名称区分大小写,这意味着起始和结束元素的名称需要采用相同的大小写书写。为了避免字符编码问题,所有 XML 文件都应保存为 Unicode UTF-8 或 UTF-16 文件。
The names of XML-elements and XML-attributes are case-sensitive, which means the name of start and end elements need to be written in the same case. To avoid character encoding problems, all XML files should be saved as Unicode UTF-8 or UTF-16 files.
XML 元素之间的空白字符(如空格、制表符和换行符)以及 XML 属性之间的空白字符将被忽略。
Whitespace characters like blanks, tabs and line-breaks between XML-elements and between the XML-attributes will be ignored.
XML 语法本身保留一些字符。因此,它们不能直接使用。要使用它们,应使用下面列出的替换实体−
Some characters are reserved by the XML syntax itself. Hence, they cannot be used directly. To use them, some replacement-entities are used, which are listed below −
Not Allowed Character |
Replacement Entity |
Character Description |
< |
< |
less than |
> |
> |
greater than |
& |
& |
ampersand |
' |
' |
apostrophe |
" |
" |
quotation mark |
XML - Documents
XML 文档是 XML 信息的基本单元,由有序包装中的元素和其他标记组成。XML 文档可以包含种类繁多的数据。例如,数字数据库、表示分子结构的数字或数学方程式。
An XML document is a basic unit of XML information composed of elements and other markup in an orderly package. An XML document can contains wide variety of data. For example, database of numbers, numbers representing molecular structure or a mathematical equation.
XML Document Example
以下示例中显示了一个简单的文档 −
A simple document is shown in the following example −
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
下图说明了 XML 文档的各个部分。
The following image depicts the parts of XML document.
Document Prolog Section
Document Prolog 位于文档顶部,在根元素之前。此部分包含 −
Document Prolog comes at the top of the document, before the root element. This section contains −
-
XML declaration
-
Document type declaration
您可以在第 XML Declaration 章中了解有关 XML 声明的更多信息
You can learn more about XML declaration in this chapter − XML Declaration
Document Elements Section
Document Elements 是 XML 的构建模块。它们将文档划分成一系列部分,每部分都用于特定的目的。您可以将文档分成多个部分,以便以不同的方式呈现它们或供搜索引擎使用。这些元素可以是容器,其中包含文本和其他元素的组合。
Document Elements are the building blocks of XML. These divide the document into a hierarchy of sections, each serving a specific purpose. You can separate a document into multiple sections so that they can be rendered differently, or used by a search engine. The elements can be containers, with a combination of text and other elements.
您可以在第 XML Elements 章中了解有关 XML 元素的更多信息
You can learn more about XML elements in this chapter − XML Elements
XML - Declaration
本章详细介绍了 XML 声明。 XML declaration 包含为 XML 处理器解析 XML 文档做准备的详细信息。它是可选的,但使用时,它必须出现在 XML 文档的第一行。
This chapter covers XML declaration in detail. XML declaration contains details that prepare an XML processor to parse the XML document. It is optional, but when used, it must appear in the first line of the XML document.
Syntax
以下语法显示 XML 声明 -
Following syntax shows XML declaration −
<?xml
version = "version_number"
encoding = "encoding_declaration"
standalone = "standalone_status"
?>
每个参数都包含一个参数名称、一个等号 (=) 以及引号内的参数值。下表详细显示了上述语法 -
Each parameter consists of a parameter name, an equals sign (=), and parameter value inside a quote. Following table shows the above syntax in detail −
Parameter |
Parameter_value |
Parameter_description |
Version |
1.0 |
Specifies the version of the XML standard used. |
Encoding |
UTF-8, UTF-16, ISO-10646-UCS-2, ISO-10646-UCS-4, ISO-8859-1 to ISO-8859-9, ISO-2022-JP, Shift_JIS, EUC-JP |
It defines the character encoding used in the document. UTF-8 is the default encoding used. |
Standalone |
yes or no |
It informs the parser whether the document relies on the information from an external source, such as external document type definition (DTD), for its content. The default value is set to no. Setting it to yes tells the processor there are no external declarations required for parsing the document. |
Rules
XML 声明应遵循以下规则 -
An XML declaration should abide with the following rules −
-
If the XML declaration is present in the XML, it must be placed as the first line in the XML document.
-
If the XML declaration is included, it must contain version number attribute.
-
The Parameter names and values are case-sensitive.
-
The names are always in lower case.
-
The order of placing the parameters is important. The correct order is: version, encoding and standalone.
-
Either single or double quotes may be used.
-
The XML declaration has no closing tag i.e. </?xml>
XML Declaration Examples
以下是 XML 声明的一些示例 -
Following are few examples of XML declarations −
没有参数的 XML 声明 -
XML declaration with no parameters −
<?xml >
定义版本的 XML 声明 -
XML declaration with version definition −
<?xml version = "1.0">
定义所有参数的 XML 声明 -
XML declaration with all parameters defined −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
包含以单引号定义所有参数的 XML 声明 −
XML declaration with all parameters defined in single quotes −
<?xml version = '1.0' encoding = 'iso-8859-1' standalone = 'no' ?>
XML - Tags
让我们了解 XML 最重要的部分之一,即 XML 标记。 XML tags 构成了 XML 的基础。它们定义了 XML 中元素的作用域。它们还可以用于插入注释、声明解析环境所需的设置,以及插入特殊指令。
Let us learn about one of the most important part of XML, the XML tags. XML tags form the foundation of XML. They define the scope of an element in XML. They can also be used to insert comments, declare settings required for parsing the environment, and to insert special instructions.
我们可以将 XML 标记大体分类如下 −
We can broadly categorize XML tags as follows −
Start Tag
每个非空 XML 元素的起始处都标记有开始标记。以下是开始标记示例 −
The beginning of every non-empty XML element is marked by a start-tag. Following is an example of start-tag −
<address>
End Tag
每个有开始标记的元素都应以结束标记结尾。以下是结束标记示例 −
Every element that has a start tag should end with an end-tag. Following is an example of end-tag −
</address>
请注意,结束标记在元素名称之前包含一个斜杠 ("/")。
Note, that the end tags include a solidus ("/") before the name of an element.
Empty Tag
出现在开始标记和结束标记之间的文本被称为内容。没有内容的元素称为空元素。可以使用以下两种方式来表示空元素 −
The text that appears between start-tag and end-tag is called content. An element which has no content is termed as empty. An empty element can be represented in two ways as follows −
开始标记后立即跟着一个结束标记,如下所示 −
A start-tag immediately followed by an end-tag as shown below −
<hr></hr>
一个完整的空元素标记如下所示 −
A complete empty-element tag is as shown below −
<hr />
对于没有任何内容的任何元素,都可以使用空元素标记。
Empty-element tags may be used for any element which has no content.
XML Tags Rules
以下是使用 XML 标记时需要遵循的规则 −
Following are the rules that need to be followed to use XML tags −
Rule 1
XML 标记区分大小写。下面的代码行是错误语法的示例 </Address>,因为两个标记存在字母大小写差异,在 XML 中被视为错误语法。
XML tags are case-sensitive. Following line of code is an example of wrong syntax </Address>, because of the case difference in two tags, which is treated as erroneous syntax in XML.
<address>This is wrong syntax</Address>
以下代码显示了正确的方式,其中我们使用相同的大小写来命名开始标记和结束标记。
Following code shows a correct way, where we use the same case to name the start and the end tag.
<address>This is correct syntax</address>
Rule 2
必须按适当的顺序关闭 XML 标记,即,在另一元素内打开的 XML 标记必须在外层元素关闭之前关闭。例如 −
XML tags must be closed in an appropriate order, i.e., an XML tag opened inside another element must be closed before the outer element is closed. For example −
<outer_element>
<internal_element>
This tag is closed before the outer_element
</internal_element>
</outer_element>
XML - Elements
XML elements 可以定义为 XML 的构建基块。元素可以充当容器来容纳文本、元素、属性、媒体对象或所有这些。
XML elements can be defined as building blocks of an XML. Elements can behave as containers to hold text, elements, attributes, media objects or all of these.
每个 XML 文档包含一个或多个元素,其范围由开始和结束标记(或对于空元素,由空元素标记)定界。
Each XML document contains one or more elements, the scope of which are either delimited by start and end tags, or for empty elements, by an empty-element tag.
Syntax
以下是编写 XML 元素的语法 −
Following is the syntax to write an XML element −
<element-name attribute1 attribute2>
....content
</element-name>
其中,
where,
-
element-name is the name of the element. The name its case in the start and end tags must match.
-
attribute1, attribute2 are attributes of the element separated by white spaces. An attribute defines a property of the element. It associates a name with a value, which is a string of characters. An attribute is written as −
name = "value"
名称后面紧跟一个 = 号和一对双引号(“ ”)或单引号(' ')引起来的字符串值。
name is followed by an = sign and a string value inside double(" ") or single(' ') quotes.
Empty Element
一个空元素(没有内容的元素)拥有以下语法:
An empty element (element with no content) has following syntax −
<name attribute1 attribute2.../>
下面是一个使用各种 XML 元素的 XML 文档示例:
Following is an example of an XML document using various XML element −
<?xml version = "1.0"?>
<contact-info>
<address category = "residence">
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
</contact-info>
XML Elements Rules
对于 XML 元素,必须遵守如下规则:
Following rules are required to be followed for XML elements −
-
An element name can contain any alphanumeric characters. The only punctuation mark allowed in names are the hyphen (-), under-score (_) and period (.).
-
Names are case sensitive. For example, Address, address, and ADDRESS are different names.
-
Start and end tags of an element must be identical.
-
An element, which is a container, can contain text or elements as seen in the above example.
XML - Attributes
这一章描述了 XML attributes 。属性是 XML 元素的一部分。一个元素可以具有多个唯一的属性。属性提供了有关 XML 元素的更多信息。更准确地说,它们定义了元素的属性。XML 属性始终是名称-值对。
This chapter describes the XML attributes. Attributes are part of XML elements. An element can have multiple unique attributes. Attribute gives more information about XML elements. To be more precise, they define properties of elements. An XML attribute is always a name-value pair.
Syntax
XML 属性具有以下语法 −
An XML attribute has the following syntax −
<element-name attribute1 attribute2 >
....content..
< /element-name>
其中 attribute1 和 attribute2 具有以下形式 −
where attribute1 and attribute2 has the following form −
name = "value"
value 必须位于双引号 (“ ") 或单引号 (' ') 中。此处,attribute1 和 attribute2 是唯一的属性标签。
value has to be in double (" ") or single (' ') quotes. Here, attribute1 and attribute2 are unique attribute labels.
属性用于向元素添加唯一的标签,将标签置于一个类别中,添加布尔标志,或将它与某些数据字符串关联。以下示例演示了属性的使用 −
Attributes are used to add a unique label to an element, place the label in a category, add a Boolean flag, or otherwise associate it with some string of data. Following example demonstrates the use of attributes −
<?xml version = "1.0" encoding = "UTF-8"?>
<!DOCTYPE garden [
<!ELEMENT garden (plants)*>
<!ELEMENT plants (#PCDATA)>
<!ATTLIST plants category CDATA #REQUIRED>
]>
<garden>
<plants category = "flowers" />
<plants category = "shrubs">
</plants>
</garden>
属性用于区分同名元素,当您不想为每种情况都创建一个新元素时。因此,使用属性可以在区分两个或多个相似元素时添加更多细节。
Attributes are used to distinguish among elements of the same name, when you do not want to create a new element for every situation. Hence, the use of an attribute can add a little more detail in differentiating two or more similar elements.
在以上示例中,我们通过包括属性类别并将不同的值分配给每个元素来对植物进行分类。因此,我们有两个类别的植物,一类是花卉,另一类是灌木。因此,我们有两个带有不同属性的植物元素。
In the above example, we have categorized the plants by including attribute category and assigning different values to each of the elements. Hence, we have two categories of plants, one flowers and other shrubs. Thus, we have two plant elements with different attributes.
您还可以观察到我们在 XML 的开头声明了此属性。
You can also observe that we have declared this attribute at the beginning of XML.
Attribute Types
下表列出了属性类型 −
Following table lists the type of attributes −
Attribute Type |
Description |
StringType |
It takes any literal string as a value. CDATA is a StringType. CDATA is character data. This means, any string of non-markup characters is a legal part of the attribute. |
TokenizedType |
This is a more constrained type. The validity constraints noted in the grammar are applied after the attribute value is normalized. The TokenizedType attributes are given as − ID − It is used to specify the element as unique. IDREF − It is used to reference an ID that has been named for another element. IDREFS − It is used to reference all IDs of an element. ENTITY − It indicates that the attribute will represent an external entity in the document. ENTITIES − It indicates that the attribute will represent external entities in the document. NMTOKEN − It is similar to CDATA with restrictions on what data can be part of the attribute. NMTOKENS − It is similar to CDATA with restrictions on what data can be part of the attribute. |
EnumeratedType |
This has a list of predefined values in its declaration. out of which, it must assign one value. There are two types of enumerated attribute − NotationType − It declares that an element will be referenced to a NOTATION declared somewhere else in the XML document. Enumeration − Enumeration allows you to define a specific list of values that the attribute value must match. |
Element Attribute Rules
以下是针对属性需要遵循的规则 −
Following are the rules that need to be followed for attributes −
-
An attribute name must not appear more than once in the same start-tag or empty-element tag.
-
An attribute must be declared in the Document Type Definition (DTD) using an Attribute-List Declaration.
-
Attribute values must not contain direct or indirect entity references to external entities.
-
The replacement text of any entity referred to directly or indirectly in an attribute value must not contain a less than sign (<)
XML - Comments
本章介绍了 XML 文档中注释的工作方式。 XML comments 类似于 HTML 注释。注释以笔记或行的形式添加,以便理解 XML 代码的目的。
This chapter explains how comments work in XML documents. XML comments are similar to HTML comments. The comments are added as notes or lines for understanding the purpose of an XML code.
注释可用于包含相关链接、信息和术语。它们只在源代码中可见;不在 XML 代码中。注释可以出现在 XML 代码中的任何位置。
Comments can be used to include related links, information, and terms. They are visible only in the source code; not in the XML code. Comments may appear anywhere in XML code.
Syntax
XML 注释具有以下语法 -
XML comment has the following syntax −
<!--Your comment-->
注释以 <!-- 开始,以 -→ 结束。你可以在字符之间以注释的形式添加文本笔记。你不得将一个注释嵌套在另一个注释中。
A comment starts with <!-- and ends with -→. You can add textual notes as comments between the characters. You must not nest one comment inside the other.
Example
以下示例演示了在 XML 文档中使用注释 −
Following example demonstrates the use of comments in XML document −
<?xml version = "1.0" encoding = "UTF-8" ?>
<!--Students grades are uploaded by months-->
<class_list>
<student>
<name>Tanmay</name>
<grade>A</grade>
</student>
</class_list>
<!-- 和 -→ 字符之间的任何文本都被视为注释。
Any text between <!-- and -→ characters is considered as a comment.
XML - Character Entities
本章介绍 XML Character Entities 。在我们理解字符实体之前,让我们先理解什么是 XML 实体。
This chapter describes the XML Character Entities. Before we understand the Character Entities, let us first understand what an XML entity is.
正如 W3 Consortium 所说,实体的定义如下 −
As put by W3 Consortium the definition of an entity is as follows −
这意味着实体是 XML 中的占位符。它们可以在文档序言或 DTD 中声明。实体有不同的类型,在本章中我们将讨论字符实体。
This means, entities are the placeholders in XML. These can be declared in the document prolog or in a DTD. There are different types of entities and in this chapter we will discuss Character Entity.
HTML 和 XML 都为它们的使用保留了一些符号,这些符号不能用作 XML 代码中的内容。例如, < 和 > 符号用于打开和关闭 XML 标记。为了显示这些特殊字符,使用了字符实体。
Both, HTML and XML, have some symbols reserved for their use, which cannot be used as content in XML code. For example, < and > signs are used for opening and closing XML tags. To display these special characters, the character entities are used.
有一些特殊字符或符号无法直接从键盘输入。字符实体也可用于显示这些符号/特殊字符。
There are few special characters or symbols which are not available to be typed directly from the keyboard. Character Entities can also be used to display those symbols/special characters.
Types of Character Entities
共有三种类型的字符实体 −
There are three types of character entities −
-
Predefined Character Entities
-
Numbered Character Entities
-
Named Character Entities
Predefined Character Entities
引入它们是为了避免在使用某些符号时出现歧义。例如,当小于 ( < ) 或大于 ( > ) 符号与角度标记 ( <> ) 一起使用时,会观察到歧义。字符实体基本上用于在 XML 中定界标记。以下是 XML 规范中预定义字符实体的列表。这些字符可用于明确地表示字符。
They are introduced to avoid the ambiguity while using some symbols. For example, an ambiguity is observed when less than ( < ) or greater than ( > ) symbol is used with the angle tag (<>). Character entities are basically used to delimit tags in XML. Following is a list of pre-defined character entities from XML specification. These can be used to express characters without ambiguity.
-
Ampersand − &
-
Single quote − '
-
Greater than − >
-
Less than − <
-
Double quote − "
Numeric Character Entities
数字引用用于引用字符实体。数字引用可以是十进制或十六进制格式。由于有成千上万个数字引用,因此它们有点难以记住。数字引用通过其在 Unicode 字符集中的数字引用字符。
The numeric reference is used to refer to a character entity. Numeric reference can either be in decimal or hexadecimal format. As there are thousands of numeric references available, these are a bit hard to remember. Numeric reference refers to the character by its number in the Unicode character set.
十进制数字引用的通用语法为 −
General syntax for decimal numeric reference is −
&# decimal number ;
十六进制数字引用的通用语法为 −
General syntax for hexadecimal numeric reference is −
&#x Hexadecimal number ;
下表列出了一些带有数字值的预定义字符实体 −
The following table lists some predefined character entities with their numeric values −
Entity name |
Character |
Decimal reference |
Hexadecimal reference |
quot |
" |
" |
" |
amp |
& |
& |
& |
apos |
' |
' |
' |
lt |
< |
< |
< |
gt |
> |
> |
> |
Named Character Entity
由于数字字符难以记住,因此首选的字符实体类型是命名字符实体。在这里,每个实体都用名称标识。
As it is hard to remember the numeric characters, the most preferred type of character entity is the named character entity. Here, each entity is identified with a name.
例如 -
For example −
-
'Aacute' represents capital character with acute accent.
-
'ugrave' represents the small with grave accent.
XML - CDATA Sections
在本章中,我们将讨论 XML CDATA section 。术语 CDATA 表示,字符数据,CDATA 定义为解析器不解析的文本块,但仍作为标记识别。
In this chapter, we will discuss XML CDATA section. The term CDATA means, Character Data. CDATA is defined as blocks of text that are not parsed by the parser, but are otherwise recognized as markup.
&lt;, &gt;, 和 &amp; 这样的预定义实体需要输入,并且在标记中通常很难阅读,在这种情况下,可以使用 CDATA 部分,通过使用 CDATA 部分,您可以命令解析器,文档的特定部分不包含标记,并且应视为普通文本。
The predefined entities such as &lt;, &gt;, and &amp; require typing and are generally difficult to read in the markup. In such cases, CDATA section can be used. By using CDATA section, you are commanding the parser that the particular section of the document contains no markup and should be treated as regular text.
Syntax
以下是 CDATA 部分的语法:
Following is the syntax for CDATA section −
<![CDATA[
characters with markup
]]>
上面语法由三部分组成:
The above syntax is composed of three sections −
-
CDATA Start section − CDATA begins with the nine-character delimiter <![CDATA[
-
CDATA End section − CDATA section ends with ]]> delimiter.
-
CData section − Characters between these two enclosures are interpreted as characters, and not as markup. This section may contain markup characters (<, >, and &), but they are ignored by the XML processor.
Example
以下标记代码显示了一个 CDATA 示例,在该示例中,解析器会忽略在 CDATA 部分内写的每个字符。
The following markup code shows an example of CDATA. Here, each character written inside the CDATA section is ignored by the parser.
<script>
<![CDATA[
<message> Welcome to TutorialsPoint </message>
]] >
</script >
在上面语法中,<message> 和 </message> 之间的所有内容都作为字符数据处理,而不是作为标记。
In the above syntax, everything between <message> and </message> is treated as character data and not as markup.
XML - WhiteSpaces
在本章中,我们将讨论 XML 文档中的 whitespace 处理。空白是一系列空格、制表符和换行符。它们通常用于使文档更容易阅读。
In this chapter, we will discuss whitespace handling in XML documents. Whitespace is a collection of spaces, tabs, and newlines. They are generally used to make a document more readable.
XML 文档包含两种类型的空白-有效空白和无效空白。两种空白的解释和示例如下。
XML document contains two types of whitespaces - Significant Whitespace and Insignificant Whitespace. Both are explained below with examples.
Significant Whitespace
有效空白出现在包含文本和标记共同存在的元素中。例如 −
A significant Whitespace occurs within the element which contains text and markup present together. For example −
<name>TanmayPatil</name>
和
and
<name>Tanmay Patil</name>
上述两个元素不同,因为 Tanmay 和 Patil 之间有空格。在 XML 文件中读取此元素的任何程序都有义务保持区别。
The above two elements are different because of the space between Tanmay and Patil. Any program reading this element in an XML file is obliged to maintain the distinction.
Insignificant Whitespace
无效空白是指只允许元素内容的空格。例如 −
Insignificant whitespace means the space where only element content is allowed. For example −
<address.category = "residence">
<address....category = "..residence">
上述示例相同。此处,空格用点号 (.) 表示。在上一个示例中,address 和 category 之间的空格是无效的。
The above examples are same. Here, the space is represented by dots (.). In the above example, the space between address and category is insignificant.
可以将一个名为 xml:space 的特殊属性附加到元素。这表示应用程序不应为此元素删除空白。您可以将此属性设置为 default 或 preserve ,如下面的示例所示 −
A special attribute named xml:space may be attached to an element. This indicates that whitespace should not be removed for that element by the application. You can set this attribute to default or preserve as shown in the following example −
<!ATTLIST address xml:space (default|preserve) 'preserve'>
其中,
Where,
-
The value default signals that the default whitespace processing modes of an application are acceptable for this element.
-
The value preserve indicates the application to preserve all the whitespaces.
XML - Processing
本章节介绍 Processing Instructions (PIs) 。按照 XML 1.0 建议书的定义,
This chapter describes the Processing Instructions (PIs). As defined by the XML 1.0 Recommendation,
处理指令 (PI) 可用于向应用程序传递信息。PI 可以出现在标记之外的文档中的任何位置。它们可以出现在前言中,包括文档类型定义 (DTD)、文本内容中或文档之后。
Processing instructions (PIs) can be used to pass information to applications. PIs can appear anywhere in the document outside the markup. They can appear in the prolog, including the document type definition (DTD), in textual content, or after the document.
Syntax
以下是 PI 的语法 −
Following is the syntax of PI −
<?target instructions?>
其中
Where
-
target − Identifies the application to which the instruction is directed.
-
instruction − A character that describes the information for the application to process.
PI 以特殊标记 <? 开头,以 ?> 结尾。在遇到字符串 ?> 后,内容的处理将立即结束。
A PI starts with a special tag <? and ends with ?>. Processing of the contents ends immediately after the string ?> is encountered.
Example
很少使用 PI。它们主要用于将 XML 文档与样式表链接。以下是示例 −
PIs are rarely used. They are mostly used to link XML document to a style sheet. Following is an example −
<?xml-stylesheet href = "tutorialspointstyle.css" type = "text/css"?>
此处,目标是 xml-stylesheet。href="tutorialspointstyle.css" 和 type="text/css" 是目标应用程序在处理给定 XML 文档时将要使用的数据或指令。
Here, the target is xml-stylesheet. href="tutorialspointstyle.css" and type="text/css" are data or instructions the target application will use at the time of processing the given XML document.
在这种情况下,浏览器通过指示在显示 XML 之前对其进行转换来识别目标;第一属性说明转换类型为 XSL,第二属性指向其位置。
In this case, a browser recognizes the target by indicating that the XML should be transformed before being shown; the first attribute states that the type of the transform is XSL and the second attribute points to its location.
XML - Encoding
Encoding 是将 unicode 字符转换为其等效二进制表示的过程。XML 处理器读取 XML 文档时,它会根据编码类型对文档进行编码。因此,我们需要在 XML 声明中指定编码类型。
Encoding is the process of converting unicode characters into their equivalent binary representation. When the XML processor reads an XML document, it encodes the document depending on the type of encoding. Hence, we need to specify the type of encoding in the XML declaration.
Encoding Types
主要有两种编码类型 −
There are mainly two types of encoding −
-
UTF-8
-
UTF-16
UTF 表示 UCS 转换格式,UCS 本身表示通用字符集。数字 8 或 16 指的是用于表示字符的位数。它们要么是 8(1 到 4 个字节),要么是 16(2 或 4 个字节)。对于不包含编码信息的文档,默认设置为 UTF-8。
UTF stands for UCS Transformation Format, and UCS itself means Universal Character Set. The number 8 or 16 refers to the number of bits used to represent a character. They are either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the documents without encoding information, UTF-8 is set by default.
Syntax
编码类型包含在 XML 文档的前言部分。UTF-8 编码的语法如下 −
Encoding type is included in the prolog section of the XML document. The syntax for UTF-8 encoding is as follows −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
UTF-16 编码的语法如下 −
The syntax for UTF-16 encoding is as follows −
<?xml version = "1.0" encoding = "UTF-16" standalone = "no" ?>
Example
以下示例展示了编码声明 −
Following example shows the declaration of encoding −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
在上面的示例中, encoding="UTF-8" 指定使用 8 位来表示字符。要表示 16 位字符,可以使用 UTF-16 编码。
In the above example encoding="UTF-8", specifies that 8-bits are used to represent the characters. To represent 16-bit characters, UTF-16 encoding can be used.
经 UTF-8 编码的 XML 文件往往比经 UTF-16 格式编码的文件小。
The XML files encoded with UTF-8 tend to be smaller in size than those encoded with UTF-16 format.
XML - Validation
Validation 是一个验证 XML 文档的过程。如果 XML 文档的内容与其元素、属性和关联的文档类型声明 (DTD) 相匹配,并且该文档符合其中表达的约束,则该文档被称为有效的。XML 解析器以两种方式处理验证。它们为:
Validation is a process by which an XML document is validated. An XML document is said to be valid if its contents match with the elements, attributes and associated document type declaration(DTD), and if the document complies with the constraints expressed in it. Validation is dealt in two ways by the XML parser. They are −
-
Well-formed XML document
-
Valid XML document
Well-formed XML Document
如果 XML 文档遵守以下规则,则该文档称为 well-formed :
An XML document is said to be well-formed if it adheres to the following rules −
-
Non DTD XML files must use the predefined character entities for amp(&), apos(single quote), gt(>), lt(<), quot(double quote).
-
It must follow the ordering of the tag. i.e., the inner tag must be closed before closing the outer tag.
-
Each of its opening tags must have a closing tag or it must be a self ending tag.(<title>….</title> or <title/>).
-
It must have only one attribute in a start tag, which needs to be quoted.
-
amp(&), apos(single quote), gt(>), lt(<), quot(double quote) entities other than these must be declared.
Example
以下是良好格式 XML 文档的示例 −
Following is an example of a well-formed XML document −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address
[
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
上述示例被称之为良好格式,原因如下 −
The above example is said to be well-formed as −
-
It defines the type of document. Here, the document type is element type.
-
It includes a root element named as address.
-
Each of the child elements among name, company and phone is enclosed in its self explanatory tag.
-
Order of the tags is maintained.
Valid XML Document
如果 XML 文档已正确格式化并关联有文档类型声明 (DTD),则称之为有效的 XML 文档。我们将在 XML - DTDs 章节中详细学习 DTD。
If an XML document is well-formed and has an associated Document Type Declaration (DTD), then it is said to be a valid XML document. We will study more about DTD in the chapter XML - DTDs.
XML - DTDs
XML 文档类型声明,通常称为 DTD,是一种精确描述 XML 语言的方法。DTD 会检查 XML 文档结构的词汇和有效性,并与适当 XML 语言的语法规则对照。
The XML Document Type Declaration, commonly known as DTD, is a way to describe XML language precisely. DTDs check vocabulary and validity of the structure of XML documents against grammatical rules of appropriate XML language.
XML DTD 可以指定在文档内部,或者保留在单独的文档中,然后单独引用。
An XML DTD can be either specified inside the document, or it can be kept in a separate document and then liked separately.
Syntax
DTD 的基本语法如下−
Basic syntax of a DTD is as follows −
<!DOCTYPE element DTD identifier
[
declaration1
declaration2
........
]>
在上述语法中,
In the above syntax,
-
The DTD starts with <!DOCTYPE delimiter.
-
An element tells the parser to parse the document from the specified root element.
-
DTD identifier is an identifier for the document type definition, which may be the path to a file on the system or URL to a file on the internet. If the DTD is pointing to external path, it is called External Subset.
-
The square brackets [ ] enclose an optional list of entity declarations called Internal Subset.
Internal DTD
如果元素是在 XML 文件中声明的,则 DTD 被称为内部 DTD。要将其称为内部 DTD,XML 声明中的 standalone 属性必须设为 yes 。这意味着,声明独立于外部资源。
A DTD is referred to as an internal DTD if elements are declared within the XML files. To refer it as internal DTD, standalone attribute in XML declaration must be set to yes. This means, the declaration works independent of an external source.
Syntax
以下是内部 DTD 的语法:
Following is the syntax of internal DTD −
<!DOCTYPE root-element [element-declarations]>
其中 root-element 是根元素的名称,element-declarations 是声明元素的位置。
where root-element is the name of root element and element-declarations is where you declare the elements.
Example
以下是内部 DTD 的一个简单示例 −
Following is a simple example of internal DTD −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
让我们浏览上述代码 −
Let us go through the above code −
Start Declaration − 使用以下声明开始 XML 声明。
Start Declaration − Begin the XML declaration with the following statement.
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
DTD − 在 XML 头之后,紧跟着文档类型声明,一般称为 DOCTYPE −
DTD − Immediately after the XML header, the document type declaration follows, commonly referred to as the DOCTYPE −
<!DOCTYPE address [
DOCTYPE 声明的元素名称开头带有一个感叹号 (!)。DOCTYPE 通知解析器此 XML 文档有一个 DTD 关联。
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. The DOCTYPE informs the parser that a DTD is associated with this XML document.
DTD Body − DOCTYPE 声明之后是 DTD 的主体,您在此声明元素、属性、实体和符号。
DTD Body − The DOCTYPE declaration is followed by body of the DTD, where you declare elements, attributes, entities, and notations.
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)>
此处声明了多个元素,这些元素构成了 <name> 文档的词汇表。<!ELEMENT name (#PCDATA)>将元素 name 定义为 “#PCDATA” 类型。此处 #PCDATA 表示可解析的文本数据。
Several elements are declared here that make up the vocabulary of the <name> document. <!ELEMENT name (#PCDATA)> defines the element name to be of type "#PCDATA". Here #PCDATA means parse-able text data.
End Declaration − 最后,DTD 的声明部分使用关闭括号和关闭尖括号 ( ]> ) 关闭。这会有效结束定义,之后 XML 文档紧接着遵循。
End Declaration − Finally, the declaration section of the DTD is closed using a closing bracket and a closing angle bracket (]>). This effectively ends the definition, and thereafter, the XML document follows immediately.
Rules
-
The document type declaration must appear at the start of the document (preceded only by the XML header) − it is not permitted anywhere else within the document.
-
Similar to the DOCTYPE declaration, the element declarations must start with an exclamation mark.
-
The Name in the document type declaration must match the element type of the root element.
External DTD
在外部 DTD 中,元素在 XML 文件之外声明。通过指定系统属性(可以是合法的 .dtd 文件或有效的 URL)来访问它们。为了将其作为外部 DTD 引用,必须将 XML 声明中的 standalone 属性设置为 no 。这意味着声明包括来自外部来源的信息。
In external DTD elements are declared outside the XML file. They are accessed by specifying the system attributes which may be either the legal .dtd file or a valid URL. To refer it as external DTD, standalone attribute in the XML declaration must be set as no. This means, declaration includes information from the external source.
Syntax
以下是外部 DTD 的语法−
Following is the syntax for external DTD −
<!DOCTYPE root-element SYSTEM "file-name">
其中 file-name 是扩展名为 .dtd 的文件。
where file-name is the file with .dtd extension.
Example
以下示例展示了外部 DTD 用法−
The following example shows external DTD usage −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<!DOCTYPE address SYSTEM "address.dtd">
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
DTD 文件 address.dtd 的内容如下图所示:
The content of the DTD file address.dtd is as shown −
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
Types
您可以使用 system identifiers 或 public identifiers 来引用外部 DTD。
You can refer to an external DTD by using either system identifiers or public identifiers.
System Identifiers
系统标识符允许指定包含 DTD 声明的外部文件的位置。语法如下−
A system identifier enables you to specify the location of an external file containing DTD declarations. Syntax is as follows −
<!DOCTYPE name SYSTEM "address.dtd" [...]>
正如您所看到的,它包含关键词 SYSTEM 和指向文档位置的 URI 引用。
As you can see, it contains keyword SYSTEM and a URI reference pointing to the location of the document.
Public Identifiers
公共标识符提供了一种查找 DTD 资源的机制,并按如下方式编写:
Public identifiers provide a mechanism to locate DTD resources and is written as follows −
<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">
正如您所看到的,它以关键词 PUBLIC 开头,后跟一个专业标识符。公共标识符用于标识目录中的一个条目。公共标识符可以遵循任何格式,但是,一种常用的格式称为 Formal Public Identifiers, or FPIs 。
As you can see, it begins with keyword PUBLIC, followed by a specialized identifier. Public identifiers are used to identify an entry in a catalog. Public identifiers can follow any format, however, a commonly used format is called Formal Public Identifiers, or FPIs.
XML - Schemas
XML Schema 通常称为 XML Schema Definition (XSD) 。它用于描述和验证 XML 数据的结构和内容。XML 模式定义元素、属性和数据类型。模式元素支持命名空间。它类似于描述数据库中数据的数据库模式。
XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe and validate the structure and the content of XML data. XML schema defines the elements, attributes and data types. Schema element supports Namespaces. It is similar to a database schema that describes the data in a database.
Syntax
您需要在 XML 文档中声明模式,如下所示 −
You need to declare a schema in your XML document as follows −
Example
以下示例展示如何使用模式 −
The following example shows how to use schema −
<?xml version = "1.0" encoding = "UTF-8"?>
<xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema">
<xs:element name = "contact">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
<xs:element name = "phone" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
XML 模式背后的基本思想是它们描述了 XML 文档可以采用的合法格式。
The basic idea behind XML Schemas is that they describe the legitimate format that an XML document can take.
Elements
正如我们在 XML - Elements 章节中所看到的,元素是 XML 文档的构建块。可以在 XSD 中如下定义元素 −
As we saw in the XML - Elements chapter, elements are the building blocks of XML document. An element can be defined within an XSD as follows −
<xs:element name = "x" type = "y"/>
Definition Types
您可以通过以下方式定义 XML 模式元素 −
You can define XML schema elements in the following ways −
Simple Type
简单类型元素仅在文本上下文中使用。一些预定义的简单类型是:xs:integer、xs:boolean、xs:string、xs:date。例如 −
Simple type element is used only in the context of the text. Some of the predefined simple types are: xs:integer, xs:boolean, xs:string, xs:date. For example −
<xs:element name = "phone_number" type = "xs:int" />
Complex Type
复杂类型是其他元素定义的容器。这允许您指定哪些子元素一个元素可以包含,并在 XML 文档中提供一些结构。例如 −
A complex type is a container for other element definitions. This allows you to specify which child elements an element can contain and to provide some structure within your XML documents. For example −
<xs:element name = "Address">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
<xs:element name = "phone" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
在上面的示例中,Address 元素由子元素组成。这是其他 <xs:element> 定义的容器,允许在 XML 文档中构建一个简单的元素层次结构。
In the above example, Address element consists of child elements. This is a container for other <xs:element> definitions, that allows to build a simple hierarchy of elements in the XML document.
Global Types
通过全局类型,您可以在文档中定义一个单一的类型,所有其他引用都可以使用它。例如,假设您想要对公司的不同地址对人和公司进行常规化。在这样的情况下,您可以按如下方式定义一个常规类型 −
With the global type, you can define a single type in your document, which can be used by all other references. For example, suppose you want to generalize the person and company for different addresses of the company. In such case, you can define a general type as follows −
<xs:element name = "AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
现在让我们在我们的示例中按如下方式使用这种类型 −
Now let us use this type in our example as follows −
<xs:element name = "Address1">
<xs:complexType>
<xs:sequence>
<xs:element name = "address" type = "AddressType" />
<xs:element name = "phone1" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name = "Address2">
<xs:complexType>
<xs:sequence>
<xs:element name = "address" type = "AddressType" />
<xs:element name = "phone2" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
无需分别为 Address1 和 Address2 定义名称和公司两次,我们现在只需进行一次定义。这使得维护更简单,即,如果您决定向地址添加“邮政编码”元素,则只需在某一位置添加它们即可。
Instead of having to define the name and the company twice (once for Address1 and once for Address2), we now have a single definition. This makes maintenance simpler, i.e., if you decide to add "Postcode" elements to the address, you need to add them at just one place.
XML - Tree Structure
XML 文档始终是描述性的。树结构通常称为 XML Tree ,它在轻松描述任何 XML 文档方面发挥着重要作用。
An XML document is always descriptive. The tree structure is often referred to as XML Tree and plays an important role to describe any XML document easily.
树结构包含根(父)元素、子元素等等。通过使用树结构,你可以了解从根开始的所有后续分支和子分支。解析从根开始,然后向下移动到一个元素的第一分支,从那里取第一个分支,依次类推到叶子节点。
The tree structure contains root (parent) elements, child elements and so on. By using tree structure, you can get to know all succeeding branches and sub-branches starting from the root. The parsing starts at the root, then moves down the first branch to an element, take the first branch from there, and so on to the leaf nodes.
Example
以下示例演示了简单的 XML 树结构 −
Following example demonstrates simple XML tree structure −
<?xml version = "1.0"?>
<Company>
<Employee>
<FirstName>Tanmay</FirstName>
<LastName>Patil</LastName>
<ContactNo>1234567890</ContactNo>
<Email>tanmaypatil@xyz.com</Email>
<Address>
<City>Bangalore</City>
<State>Karnataka</State>
<Zip>560212</Zip>
</Address>
</Employee>
</Company>
以下树结构表示上述 XML 文档 −
Following tree structure represents the above XML document −
在上述图表中,有一个名为 <company> 的根元素。在其中,还有一个元素 <Employee>。在员工元素内部,有五个分支,分别命名为 <FirstName>、<LastName>、<ContactNo>、<Email> 和 <Address>。在 <Address> 元素内部,有三个子分支,分别命名为 <City> <State> 和 <Zip>。
In the above diagram, there is a root element named as <company>. Inside that, there is one more element <Employee>. Inside the employee element, there are five branches named <FirstName>, <LastName>, <ContactNo>, <Email>, and <Address>. Inside the <Address> element, there are three sub-branches, named <City> <State> and <Zip>.
XML - DOM
Document Object Model (DOM) 是 XML 的基础。XML 文档具有称为节点的信息单元层次结构;DOM 的作用是描述这些节点及其之间的关系。
The Document Object Model (DOM) is the foundation of XML. XML documents have a hierarchy of informational units called nodes; DOM is a way of describing those nodes and the relationships between them.
DOM 文档是由以层次结构组织的节点或信息片段的集合。此层次结构允许开发人员浏览树,查找特定信息。由于 XML DOM 基于信息层次结构,因此 XML DOM 被称为基于树的。
A DOM document is a collection of nodes or pieces of information organized in a hierarchy. This hierarchy allows a developer to navigate through the tree looking for specific information. Because it is based on a hierarchy of information, the DOM is said to be tree based.
另一方面,XML DOM 还提供了一个 API,允许开发人员在树的任何时间点添加、编辑、移动或者删除节点,以便创建应用程序。
The XML DOM, on the other hand, also provides an API that allows a developer to add, edit, move, or remove nodes in the tree at any point in order to create an application.
Example
以下示例 (sample.htm) 将 XML 文档 (“address.xml”) 解析成 XML DOM 对象,然后用 JavaScript 从该对象中提取一些信息 −
The following example (sample.htm) parses an XML document ("address.xml") into an XML DOM object and then extracts some information from it with JavaScript −
<!DOCTYPE html>
<html>
<body>
<h1>TutorialsPoint DOM example </h1>
<div>
<b>Name:</b> <span id = "name"></span><br>
<b>Company:</b> <span id = "company"></span><br>
<b>Phone:</b> <span id = "phone"></span>
</div>
<script>
if (window.XMLHttpRequest)
{// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp = new XMLHttpRequest();
}
else
{// code for IE6, IE5
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("GET","/xml/address.xml",false);
xmlhttp.send();
xmlDoc = xmlhttp.responseXML;
document.getElementById("name").innerHTML=
xmlDoc.getElementsByTagName("name")[0].childNodes[0].nodeValue;
document.getElementById("company").innerHTML=
xmlDoc.getElementsByTagName("company")[0].childNodes[0].nodeValue;
document.getElementById("phone").innerHTML=
xmlDoc.getElementsByTagName("phone")[0].childNodes[0].nodeValue;
</script>
</body>
</html>
address.xml 的内容如下 −
Contents of address.xml are as follows −
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
现在让我们将这两个文件 sample.htm 和 address.xml 保存在同一个目录 /xml 中,并通过在任何浏览器中打开 sample.htm 文件来执行该文件。这应该会产生以下输出。
Now let us keep these two files sample.htm and address.xml in the same directory /xml and execute the sample.htm file by opening it in any browser. This should produce the following output.
在这里,您可以了解到如何提取每个子节点以显示其值。
Here, you can see how each of the child nodes is extracted to display their values.
XML - Namespaces
Namespace 是唯一名称的集合。命名空间是一种机制,通过它可以将元素和属性名分配给一个组。命名空间由 URI(统一资源标识符)识别。
A Namespace is a set of unique names. Namespace is a mechanisms by which element and attribute name can be assigned to a group. The Namespace is identified by URI(Uniform Resource Identifiers).
Namespace Declaration
命名空间使用保留的属性声明。这样一种属性名必须是 xmlns 或以 xmlns: 开头,如下所示 −
A Namespace is declared using reserved attributes. Such an attribute name must either be xmlns or begin with xmlns: shown as below −
<element xmlns:name = "URL">
Syntax
-
The Namespace starts with the keyword xmlns.
-
The word name is the Namespace prefix.
-
The URL is the Namespace identifier.
Example
命名空间只影响文档中的有限区域。包含声明及其所有后代的元素都在命名空间的作用域中。以下是 XML 命名空间的一个简单示例 −
Namespace affects only a limited area in the document. An element containing the declaration and all of its descendants are in the scope of the Namespace. Following is a simple example of XML Namespace −
<?xml version = "1.0" encoding = "UTF-8"?>
<cont:contact xmlns:cont = "www.tutorialspoint.com/profile">
<cont:name>Tanmay Patil</cont:name>
<cont:company>TutorialsPoint</cont:company>
<cont:phone>(011) 123-4567</cont:phone>
</cont:contact>
在这里,命名空间前缀是 cont ,命名空间标识符(URI)是 www.tutorialspoint.com/profile。这意味着具有 cont 前缀(包括联系元素)的元素名和属性名都属于 www.tutorialspoint.com/profile 命名空间。
Here, the Namespace prefix is cont, and the Namespace identifier (URI) as www.tutorialspoint.com/profile. This means, the element names and attribute names with the cont prefix (including the contact element), all belong to the www.tutorialspoint.com/profile namespace.
XML - Databases
XML Database 用于以 XML 格式存储大量的信息。随着各个领域中对 XML 的使用日益增多,需要有一个安全的地方来存储 XML 文档。存储在数据库中的数据可以使用 XQuery 查询、序列化并导出为所需的格式。
XML Database is used to store huge amount of information in the XML format. As the use of XML is increasing in every field, it is required to have a secured place to store the XML documents. The data stored in the database can be queried using XQuery, serialized, and exported into a desired format.
XML Database Types
有两种主要的 XML 数据库 −
There are two major types of XML databases −
-
XML- enabled
-
Native XML (NXD)
XML - Enabled Database
支持 XML 的数据库只不过是为 XML 文档转换提供的扩展。这是一个关系数据库,其中数据存储在由行和列组成的表中。表包含一组记录,而记录又包含字段。
XML enabled database is nothing but the extension provided for the conversion of XML document. This is a relational database, where data is stored in tables consisting of rows and columns. The tables contain set of records, which in turn consist of fields.
Native XML Database
原生 XML 数据库基于容器而不是表格格式。它可以存储大量的 XML 文档和数据。原生 XML 数据库通过 XPath 表达式查询。
Native XML database is based on the container rather than table format. It can store large amount of XML document and data. Native XML database is queried by the XPath-expressions.
原生 XML 数据库比启用 XML 的数据库有优势。它非常有能力存储、查询和维护 XML 文档,而启用 XML 的数据库则不能。
Native XML database has an advantage over the XML-enabled database. It is highly capable to store, query and maintain the XML document than XML-enabled database.
Example
以下示例演示了 XML 数据库 −
Following example demonstrates XML database −
<?xml version = "1.0"?>
<contact-info>
<contact1>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact1>
<contact2>
<name>Manisha Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 789-4567</phone>
</contact2>
</contact-info>
这里创建了一个联系人表,其中包含联系人的记录(contact1 和 contact2),而联系人记录又包含三个实体 − 名称、公司和电话。
Here, a table of contacts is created that holds the records of contacts (contact1 and contact2), which in turn consists of three entities − name, company and phone.
XML - Viewers
这一章节对各种 methods to view an XML document 进行了描述。可以使用一个简单的文本编辑器或任何浏览器来查看 XML 文档。大多数主流浏览器都支持 XML。XML 文件可以通过双击 XML 文档(如果它是一个本地文件)或者在地址栏中键入 URL 路径(如果该文件位于服务器上),以与我们以相同的方式在浏览器中打开其他文件的方式在浏览器中打开。XML 文件以 ".xml" 扩展名保存。
This chapter describes THE various methods to view an XML document. An XML document can be viewed using a simple text editor or any browser. Most of the major browsers supports XML. XML files can be opened in the browser by just double-clicking the XML document (if it is a local file) or by typing the URL path in the address bar (if the file is located on the server), in the same way as we open other files in the browser. XML files are saved with a ".xml" extension.
让我们探讨各种查看 XML 文件的方法。使用以下示例 (sample.xml) 来查看此章的所有章节。
Let us explore various methods by which we can view an XML file. Following example (sample.xml) is used to view all the sections of this chapter.
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
Text Editors
可以按如下所示,使用任何简单的文本编辑器(例如记事本、TextPad 或 TextEdit)来创建或查看 XML 文档 −
Any simple text editor such as Notepad, TextPad, or TextEdit can be used to create or view an XML document as shown below −
Firefox Browser
通过双击该文件在 Chrome 中打开以上 XML 代码。XML 代码会以带颜色的编码形式显示,这使得代码更具可读性。它在 XML 元素的左侧显示加号 () 或减号 (-)。当我们单击减号 (-) 时,代码会被隐藏。当我们单击加号 () 时,代码行会得到展开。Firefox 中的输出如下所示 −
Open the above XML code in Chrome by double-clicking the file. The XML code displays coding with color, which makes the code readable. It shows plus() or minus (-) sign at the left side in the XML element. When we click the minus sign (-), the code hides. When we click the plus () sign, the code lines get expanded. The output in Firefox is as shown below −
Chrome Browser
在 Chrome 浏览器中打开以上 XML 代码。代码将按如下所示显示 −
Open the above XML code in Chrome browser. The code gets displayed as shown below −
Errors in XML Document
如果 XML 代码缺少一些标签,则会在浏览器中显示一条消息。让我们尝试在 Chrome 中打开以下 XML 文件 −
If your XML code has some tags missing, then a message is displayed in the browser. Let us try to open the following XML file in Chrome −
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
在以上代码中,起始标签和结束标签不匹配(请参阅 contact_info 标签),因此浏览器会显示如下所示的错误消息 −
In the above code, the start and end tags are not matching (refer the contact_info tag), hence an error message is displayed by the browser as shown below −
XML - Editors
XML Editor 是一个标记语言编辑器。可以通过现有的编辑器(如记事本、写字板或任何类似的文本编辑器)编辑或创建 XML 文档。您还可以在线查找专业 XML 编辑器或下载,该编辑器具有更强大的编辑功能,例如:
XML Editor is a markup language editor. The XML documents can be edited or created using existing editors such as Notepad, WordPad, or any similar text editor. You can also find a professional XML editor online or for downloading, which has more powerful editing features such as −
-
It automatically closes the tags that are left open.
-
It strictly checks syntax.
-
It highlights XML syntax with colour for increased readability.
-
It helps you write a valid XML code.
-
It provides automatic verification of XML documents against DTDs and Schemas.
Open Source XML Editors
以下是一些开源 XML 编辑器:
Following are some open source XML editors −
-
Online XML Editor − This is a light weight XML editor which you can use online.
-
Xerlin − Xerlin is an open source XML editor for Java 2 platform released under an Apache license. It is a Java based XML modelling application, for creating and editing XML files easily.
-
CAM - Content Assembly Mechanism − CAM XML Editor tool comes with XML+JSON+SQL Open-XDX sponsored by Oracle.
XML - Parsers
XML parser 是一个软件库或一个包,它提供接口,让客户端应用程序能够处理 XML 文档。它检查 XML 文档是否格式正确,还可以验证 XML 文档。现代浏览器具有内置的 XML 解析器。
XML parser is a software library or a package that provides interface for client applications to work with XML documents. It checks for proper format of the XML document and may also validate the XML documents. Modern day browsers have built-in XML parsers.
下图显示了 XML 解析器如何与 XML 文档进行交互 −
Following diagram shows how XML parser interacts with XML document −
解析器的目标是将 XML 转换称可读代码。
The goal of a parser is to transform XML into a readable code.
为了简化解析过程,一些商用产品可用于促进 XML 文档的分解并产生更可靠的结果。
To ease the process of parsing, some commercial products are available that facilitate the breakdown of XML document and yield more reliable results.
以下是常用的一些解析器:
Some commonly used parsers are listed below −
-
MSXML (Microsoft Core XML Services) − This is a standard set of XML tools from Microsoft that includes a parser.
-
System.Xml.XmlDocument − This class is part of .NET library, which contains a number of different classes related to working with XML.
-
Java built-in parser − The Java library has its own parser. The library is designed such that you can replace the built-in parser with an external implementation such as Xerces from Apache or Saxon.
-
Saxon − Saxon offers tools for parsing, transforming, and querying XML.
-
Xerces − Xerces is implemented in Java and is developed by the famous open source Apache Software Foundation.
XML - Processors
当软件程序读取 XML 文档并相应采取操作时,这称为处理 XML。能读取和处理 XML 文档的任何程序都称为 XML 处理器。XML 处理器读取 XML 文件并将其转换为程序其他部分可访问的在内存结构中。
When a software program reads an XML document and takes actions accordingly, this is called processing the XML. Any program that can read and process XML documents is known as an XML processor. An XML processor reads the XML file and turns it into in-memory structures that the rest of the program can access.
最基本的 XML 处理器读取 XML 文档并将其转换为供其他程序或子例程使用的内部表示。这称为解析器,它是每个 XML 处理程序的重要组成部分。
The most fundamental XML processor reads an XML document and converts it into an internal representation for other programs or subroutines to use. This is called a parser, and it is an important component of every XML processing program.
处理器涉及处理说明,这可以在章节 Processing Instruction 中研究。
Processor involves processing the instructions, that can be studied in the chapter Processing Instruction.
Types
XML 处理器根据是否检查 XML 文档的有效性分为 validating 或 non-validating 类型。发现有效性错误的处理器必须能够报告此错误,但可以继续正常处理。
XML processors are classified as validating or non-validating types, depending on whether or not they check XML documents for validity. A processor that discovers a validity error must be able to report it, but may continue with normal processing.
A few validating parsers are − xml4c(IBM,在 C++ 中)、xml4j(IBM,在 Java 中)、MSXML(Microsoft,在 Java 中)、TclXML(TCL)、xmlproc(Python)、XML::Parser(Perl)、Java Project X(Sun,在 Java 中)。
A few validating parsers are − xml4c (IBM, in C++), xml4j (IBM, in Java), MSXML (Microsoft, in Java), TclXML (TCL), xmlproc (Python), XML::Parser (Perl), Java Project X (Sun, in Java).
A few non-validating parsers are − OpenXML(Java)、Lark(Java)、xp(Java)、AElfred(Java)、expat ©、XParse(JavaScript)、xmllib(Python)。
A few non-validating parsers are − OpenXML (Java), Lark (Java), xp (Java), AElfred (Java), expat ©, XParse (JavaScript), xmllib (Python).