Java Xml 简明教程

Java XML Overview

Java XML 只是通过 Java 程序处理 XML 文档。想象一下,我们有一个文件“products.xml”,其中包含产品详细信息,如名称、品牌和价格。

Java XML is simply working with an XML document from a Java program. Imagine, we have a file "products.xml" where we have product details such as name, brand and price.

现在,我们希望使用 Java 编程为某些产品更新价格。在编写此类 Java 程序以访问 XML 文档之前,我们应该了解 XML 基础知识。

Now, we want to update prices for some products using Java programming. Before writing such java programs to access XML documents, we should know basics of XML.

What is XML?

XML 是 E*X*tensible *M*arkup *L*anguage 的缩写。它是一种基于文本的标记语言,它用于存储和传输数据。它是自描述的,既可以让人阅读,也可以机器识别。以下是 XML 的一些重要要点 −

XML stands for E*X*tensible *M*arkup *L*anguage. It is a text-based markup language which is used to store and transport data. It is self-descriptive and both human-readable and, machine-readable. Following are some notable points on XML −

  1. XML is a markup language.

  2. XML is a tag based language like HTML.

  3. XML tags are not predefined like HTML.

  4. You can define your own tags which is why it is called extensible language.

  5. XML tags are designed to be self-descriptive.

  6. XML is W3C Recommendation for data storage and data transfer.

XML Document

XML 文档是按结构化有条理的方式定义数据的元素集合。XML 文档有两节,分别是 document prologdocument elements

An XML document is the collection of elements that define data in a well structured and organized manner. An XML document has two sections, namely, document prolog and document elements.

Syntax

以下是 XML 文档的语法 −

Following is the syntax of an XML document −

<?xml ?>
<root_element>
	<element></element>
	...
</root_element>

其中,

Where,

  1. <?xml ?> is the XML declaration statement. If included, it must be kept in the first line.

  2. <root_element> is the root element and it is the parent of all other elements.

  3. <element> is the sub element of the root element.

Example

以下示例显示了以 <Employee> 作为根元素、 <name><role><salary> 作为子元素的员工详细信息。每个元素的数据都用开放标记和关闭标记括起来。

Following example shows Employee details with <Employee> as the root element and <name>, <role>, <salary> as sub elements. Data for each element is enclosed between opening and closing tags.

<?xml version="1.0" ?>
<Employee>
	<name>Kiran</name>
	<role>developer</role>
	<salary>25,000</salary>
</Employee>

Elements in XML

元素是 XML 文档的构建基块。它由开放标记、内容和关闭标记组成。在 xml 文档中,应该始终有一个根元素,我们可以在其中编写多个子元素。元素也可能包含任意数量的属性。

An element is the building block of an XML document. It consists of an opening tag, content and a closing tag. In an xml document, there should always be a root element, inside which we can write many sub elements. Elements can also have any number of attributes inside them.

Syntax

以下是 XML 元素的语法 −

Following is the syntax of an XML element −

<root>
	<child>
		<subchild>.....</subchild>
	</child>
</root>

其中,

Where,

  1. <root> is the root element of the XML document.

  2. <child> is the child element and its parent is the root element.

  3. <subchild> is the sub child and its parent is the child element.

Example

让我们来看一个例子,其中 DOB(出生日期)被进一步细分为日期、月份和年份。在这里, <DOB> 是根元素, <date><month><year> 是子元素。

Let us see an example where DOB(date of birth) is further structured into date, month and year. Here, <DOB> is the root element and <date>, <month>, <year> are child elements.

<DOB>
	<date>27</date>
	<month>March</month>
	<year>2000</year>
</DOB>

Tags in XML

XML 中的标记是自解释的和用户定义的。这些标记括在小于 ( < ) 和大于 ( > ) 符号中。XML 区分大小写,因此开放标记和关闭标记应具有相同的名称。

Tags in XML are self-explanatory and user defined. These are enclosed in less than (<) and greater than (>) symbols. XML is case sensitive and hence opening and closing tags should have same name.

Example

在以下示例中,我们已经使用开放和关闭标记编写了一个地址元素。

In the following example, we have written an address element with opening and closing tags.

<address>Hyderabad</address>

现在,让我们来看看一些 incorrect 编写 XML 标记的方法:

Now, let us see some incorrect ways of writing XML tags:

<Address></address>
<ADDRESS></address>

Attributes in XML

XML 中的元素可以具有属性。属性是 name-value 对,它们提供了有关特定元素的进一步具体信息。一个元素可以具有任意数量的属性。

Elements in XML can have attributes. Attributes are name-value pairs that provide further specific information about a particular element. An element can have any number of attributes.

Syntax

下面是 XML 特性的语法 −

Following is the syntax for XML attributes −

<element_name attribute_name="value" >content</element_name>

其中,

Where,

  1. element_name is the name of the element.

  2. attribute_name is the name of the attribute.

  3. value is the value of the corresponding attribute.

Example

现在,我们看一看以下例子,其中“Student”元素有四个特性,即 name、class、marks 和 DOB。

Now, let’s look at the following example where we have four attributes, name, class, marks and DOB for the 'Student' element.

<Student name="Kiran" class="8" marks="50" DOB="27-03-2000"></Student>

Using sub elements to replace attributes

除了特性,也可以在元素中使用子元素,以实现与特性相同的功能。相同的 student 示例也可以写成如下形式:

Instead of attributes, sub elements can also be used in elements to achieve the same purpose as of attributes. The same student example can also be written as follows:

<Student>
	<name>Kiran</name>
	<class>8</class>
	<marks>50</marks>
	<DOB>27-03-2000</DOB>
</Student>

在上述示例中,如果我们进一步需要以日期、月份和年份来表示出生日期,那么可以通过对 DOB 元素使用子元素来实现,如下所示:

In the above example, If we further want the date of birth as date, month and year then it can be done by using sub elements for DOB element as follows :

<Student>
	<name>Kiran</name>
	<class>8</class>
	<marks>50</marks>
	<DOB>
		<date>27</date>
		<month>03</month>
		<year>2000</year>
	</DOB>
</Student>

XML Declaration

XML 声明描述了有关整个 XML 文档的基本格式信息,如版本、编码和独立状态。如果文档中包含 XML 声明,则必须将其写在第一行。

XML declaration describes the basic format information such as version, encoding and standalone status about the entire XML document. If an XML declaration is included in the document, it must be written in the first line.

Syntax

下面是 XML 声明的语法 −

Following is the syntax of XML declaration −

<?xml
	version="version_number"
	encoding="encoding_type"
	standalone="standalone_status"
?>

其中,

Where,

  1. XML declaration starts with the character sequence <?xml and ends with the character sequence ?>

  2. version is the version number of the XML used

  3. encoding is the character encoding used for the content of XML document

  4. standalone is a boolean attribute whose default value is set to 'no'. This tells whether the XML document is standalone or uses information from external source to parse the document such as DTD(Document Type Definition). The default value is set to 'no'.

Example

以下示例使用 XML 1.0 版本,编码类型为 UTF-16 并且为独立的。

Following example uses XML 1.0 version with encoding type UTF-16 and it is standalone.

<?xml
	version="1.0"
	encoding="UTF-16"
	standalone="yes"
?>

XML Comments

XML 中的注释用于解释文档的用途和详细信息。始终建议在文档中包含注释,因为它使从未阅读过该文档的人更容易理解该文档。XML 遵循与 HTML 相同的语法。

Comments in XML are used to explain the document’s purpose and details. It is always a best practice to include comments in the document because it makes the task simpler to the one who is reading the document for the first time. XML follows the same syntax as of HTML.

Syntax

下面是单行和多行 XML 注释的语法 −

Following is the syntax for both single line and multi line XML comments −

<!-- comment here -->

Example

假设我们在 2015 年收集了某所大学的学院信息。这些记录可能会在数年内发生变化。因此,在注释中提及这一点有助于正在编辑的人了解收集这些详细信息的时间。

Let us say we have collected the information of departments in a college in the year 2015. These records might be changed over the years. So, mentioning this in the comments helps the one who is editing to know when these details have been collected.

<?xml version = "1.0" encoding = "UTF-8" ?>
<!-- Following information is collected in the year 2015 -->
<college>
	<Department>
		<name>CSE</name>
		<code>CS</code>
		<faculty_strength>25</faculty_strength>
	</Department>
	<Department>
		<name>ECE</name>
		<code>EC</code>
		<faculty_strength>20</faculty_strength>
	</Department>
</college>

XML Namespaces

XML 命名空间用于解决 XML 文档中的命名冲突。当添加两个或多个 XML 片段时,则这些 XML 代码片段可能会使用一些具有相同名称的标记。这会让 XML 解析器感到困惑。为了避免此类命名冲突,请使用 XML 名称空间。

XML namespaces are used to resolve name conflicts in the XML document. When two or more XML fragments are added, then there is a chance that these XML code fragments might use some tags with same name. Then, this confuses the XML parser. To avoid these kind of name conflicts, XML Namespaces are used.

Example

假设我们创建了一个 XML 元素,用于保存有关茶几的信息 −

Assume we have created an XML element holding the information about a coffee table −

<table>
	<shape>Oval</shape>
	<material>Wood</material>
	<seat_count>3</seat_count>
	<cost>15000</cost>
</table>

假设我们已经创建了另一个用于存储有关餐桌的信息的元素,如下所示:

Suppose we have created another element which holds information about a dining table as −

<table>
	<shape>Rectangle</shape>
	<material>Marble</material>
	<seat_count>6</seat_count>
	<cost>25000</cost>
</table>

当上述两个 XML 代码片段(放在单个文件中)添加在一起时,将发生名称冲突。虽然两个元素的名称相同,但它们提供的信息各不相同。有两种方法可以在 XML 中解决这些名称冲突。它们是:

When the above two XML code fragments are added together (in a single file), there will be a name conflict. Though the name of both the elements is same the information provided by them varies. There are two ways to resolve these name conflicts in XML. They are −

  1. Using Prefix

  2. Using namespace declaration

Using Prefix

我们可以通过向元素添加前缀使之彼此区分。为了解决上述名称冲突,我们可以向保存咖啡桌信息的元素添加前缀“c”,类似地,我们可以向另一个元素(餐桌)添加前缀“d”。

We can differentiate the elements by adding prefix to them. To resolve the above name conflict, we can add the prefix 'c' to the element holding the info about the coffee table and similarly we can add the prefix 'd' for the other element (dining table).

Example

让我们采用相同的表格示例,并尝试使用前缀解决名称冲突。

Let us take the same table example and try to resolve the name conflict using prefixes.

<!-- Coffee Table -->
<c:table>
	<shape>Oval</shape>
	<material>Wood</material>
	<seat_count>3</seat_count>
	<cost>15000</cost>
</table>

<!-- Dining Table -->
<d:table>
    <d:shape>Rectangle</d:shape>
    <d:material>Marble</d:material>
    <d:seat_count>6</d:seat_count>
    <d:cost>25000</d:cost>
</d:table>

Drawbacks of Using Prefixes

在使用前缀时,仍有可能出现两个元素具有与名称相同的前缀的情况。在这种情况下,冲突依然存在。

While using prefixes there still might be a chance where two elements have same prefix along with the name. In such cases the conflict prevails.

假设假如我们添加另一个元素来提供有关梳妆台的信息,则为了区分这些元素,我们需要使用前缀“d”。这再次导致餐桌和梳妆台之间出现冲突。因此,使用前缀可以在某种程度上解决冲突,但做不到彻底解决。

Suppose if we add another element providing info about a dressing table, to differentiate, we need to use the prefix 'd'. This again brings a conflict between dining table and dressing table. Hence, using prefix can solve the conflict to some extent but not completely.

Using "namespace" Declaration

XML 命名空间声明用于有效解决名称冲突。使用名为“xmlns”的新属性。

XML namespace declaration is used to resolve name conflicts effectively. A new attribute named 'xmlns' is used.

Syntax

Syntax

以下是 XML 命名空间的语法:

Following is the syntax for XML namespace −

<element-name xmlns:prefix="URI">

其中,

Where,

  1. element-name: Element name on which the namespace is used.

  2. xmlns: A compulsory keyword to declare namespace.

  3. prefix: Namespace prefix

  4. URI: Namespace identifier

Example

以下示例对三个表格标记使用了 XML 命名空间声明。现在,通过在其命名空间 URI 中进行区分,解决了餐桌和梳妆台之间的冲突。

The following example uses XML namespace declaration for three table tags. Now, the conflict between dining table and dressing table is resolved by differentiating them in their namespace URI.

<!-- Coffee Table -->
<h:table xmlns:h="/coffee">
<c:table>
	<shape>Oval</shape>
	<material>Wood</material>
	<seat_count>3</seat_count>
	<cost>15000</cost>
</table>

<!-- Dining Table -->
<d:table xmlns:h="/dining">
    <d:shape>Rectangle</d:shape>
    <d:material>Marble</d:material>
    <d:seat_count>6</d:seat_count>
    <d:cost>25000</d:cost>
</d:table>

<!-- Dressing Table -->
<d:table xmlns:h="/dressing">
    <d:brand>Trevi Furniture</d:brand>
    <d:material>Engineered wood</d:material>
    <d:cost>15000</d:cost>
</d:table>