Java Xml 简明教程

Java XPath Parser - Parse XML Document

Java XPath 解析器是一个 API,用于使用 XPath 表达式和函数解析 XML 文档。它有助于我们遍历整个 XML 文档,并将元素作为 NodeList 中的节点获取。包“javax.xml.xpath”为 XPath 表达式的评估提供了 API。在本章中,我们将看到如何遍历 XML 文档中的所有节点。

Java XPath parser is an API in Java to parse XML documents using XPath expressions and functions. It helps us to traverse through the entire XML document and obtain elements as nodes inside a NodeList. The package 'javax.xml.xpath' provides the API for the evaluation of XPath expressions. In this chapter, we will see how to traverse through all the nodes in an XML document.

Parse XML Using Java XPath Parser

以下是使用 Java XPath 解析器解析文档时使用的步骤−

Following are the steps used while parsing a document using Java XPath Parser −

  1. *Step 1: *Creating a DocumentBuilder

  2. *Step 2: *Reading the XML

  3. *Step 3: *Creating Document from file or Stream

  4. *Step 4: *Building XPath

  5. *Step 5: *Preparing and Evaluating XPath expression

  6. *Step 6: *Iterating over NodeList

  7. *Step 7: *Retrieving Elements

Step 1: Create a DocumentBuilder

DocumentBuilderFactory 是一个工厂 API,用于创建 DocumentBuilder 对象。DocumentBuilderFactory 的 newDocumentBuilder() 方法返回一个 DocumentBuilder 对象,如下所示−

The DocumentBuilderFactory is a factory API that is used to create DocumentBuilder objects. The newDocumentBuilder() method of DocumentBuilderFactory returns a DocumentBuilder object as follows −

DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

Step 2: Reading the XML

FileReader 类用于从输入文件读取字符流。如果找不到文件或由于某种原因无法读取文件,则以下语句将抛出 "FileNotFoundException"。

The FileReader class is used to read streams of characters from the input file. The following statement throws "FileNotFoundException" if the file can’t be found or if the file can’t be read for some reason.

FileReader fileReader = new FileReader("src/input.txt");

除了从文件中读取 XML 内容之外,我们还可以获取内容的字符串形式,并将其转换为字节,如下所示:

Instead of reading XML content from the file, we can also get the content in the form of a string and convert it into bytes as follows −

StringBuilder xmlBuilder = new StringBuilder();
xmlBuilder.append("<class>xyz</class>");
ByteArrayInputStream input = new ByteArrayInputStream(xmlBuilder.toString().getBytes("UTF-8"));

Step 3: Create a Document from a file or stream

第一步中创建的 DocumentBuilder 对象用于从输入文件或输入流创建文档。parse() 方法将文件或流作为参数,并返回一个 Document 对象,如下所示−

The DocumentBuilder object created in the first step is used to create a document from the input file or input stream. The parse() method takes file or stream as an argument and returns a Document object as follows −

Document doc = builder.parse(input);

Step 4: Building XPath

要使用 XPath 解析 XML 文档,我们需要使用 XPathFactory 的 newXPath() 方法构建一个 newXPath。此方法返回一个新的 XPath,如下所示−

To parse XML document using XPath, we need to build a newXPath using newXPath() method of XPathFactory. This method returns a new XPath as follows −

XPath xPath =  XPathFactory.newInstance().newXPath();

Step 5: Preparing and Evaluating XPath expression

正如我们在上一章中所讨论的,XPath 具有帮助我们从 XML 文档中检索信息的表达式,在这里我们需要根据要求创建一个这样的表达式并对其进行评估。evaluate() 方法将表达式的结果作为 NodeList 返回,如下所示−

As we have discussed in the previous chapter, XPath has expressions that help us retrieve information from the XML documents, here we need to create one such expression based on the requirement and evaluate it. The evaluate() method returns the result of the expression as a NodeList as follows −

String expression = "/class/student";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(
   doc, XPathConstants.NODESET);

Step 6: Iterating over NodeList

我们现在迭代我们在步骤 5 中获取的 NodeList 以检查每个节点并相应地检索信息。在这里,我们使用了一个 for 循环来遍历 NodeList,你可以使用你选择的任何循环。

The NodeList we get in step 5 is now iterated to examine each node and retrieve information accordingly. Here, we have used a for loop to iterate over the NodeList, you can use any loop of your choice.

for (int i = 0; i < nodeList.getLength(); i++) {
   Node nNode = nodeList.item(i);
   ...
}

Step 7: Retrieving Elements

按照以上六步操作后,我们获取了节点形式的元素。通过使用 DOM 中可用的接口方法,我们可以检索必要的元素和属性。

After following the above six steps, we obtain the elements in the form of nodes. By using the methods of interfaces available in DOM, we can retrieve the necessary elements and attributes.

Retrieve Root Element

若要从 XML 文档中检索根元素,我们可以使用 XPath 表达式“ / ”。使用此表达式并进行评估,得到只有一个 Node 的 NodeList。

To retrieve root element from the XML document, we have the XPath expression '/'. Using this expression and by evaluating this, we get the NodeList with just a single Node.

Node 接口的 getNodeName() 方法将节点名称作为 String 对象进行检索,而 getTextContent() 方法将文本内容作为 String 形式进行返回。

The getNodeName() method of Node interface retrieves the name of the node as a String object and the getTextContent() method returns the text content of the node as a String.

Example

在以下示例中,我们已将 XML 内容作为 StringBuilder 对象进行处理,并使用 parse() 函数进行解析。我们只有一个元素,这显然是根元素,并且我们的文本内容是“xyz class”。通过使用上面讨论的方法,我们可以检索根元素的信息。

In the following example, we have taken XML content as a StringBuilder object and parsed it using parse() function. We have only a single element, which is obviously the root and we have text content as 'xyz class'. Using the methods discussed above we retrieve the information of root element.

import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;

public class RetrieveRoot {
   public static void main(String[] args) {
      try {

         //Creating a DocumentBuilder
         DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
         DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();

         //Reading the XML
         StringBuilder xmlBuilder = new StringBuilder();
 	     xmlBuilder.append("<class>xyz class</class>");
 	     ByteArrayInputStream input = new ByteArrayInputStream(xmlBuilder.toString().getBytes("UTF-8"));

         //Creating Document from file or Stream
         Document doc = dBuilder.parse(input);

         //Building XPath
         XPath xPath =  XPathFactory.newInstance().newXPath();

         //Preparing and Evaluating XPath expression
         String expression = "/";
         NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(
            doc, XPathConstants.NODE);

         //Iterating over NodeList
         for (int i = 0; i < nodeList.getLength(); i++) {
        	Node node = nodeList.item(i);
			//Retrieving Root Element
        	System.out.println("Root Element Name: " + node.getNodeName());
        	System.out.println("Text Content: " + node.getTextContent());
         }
      } catch(Exception e) {
    	  e.printStackTrace();
      }
   }
}

Output

根元素名称和文本内容显示在控制台上。

The root element name and text content are displayed on the console.

Root Element Name: class
Text Content: xyz class

Retrieving Attributes

我们在评估完 XPath 表达式后得到的 NodeList 具有不同的节点类型的节点。如果节点类型等于“ELEMENT_NODE”,我们可以将这些节点转换为元素。Element 接口的 getAttribute("attribute_name") 方法用于以 String 形式检索属性值。

The NodeList we get after evaluating the XPath expression has nodes with different node types. We can convert those nodes into Elements if the node type is equal to 'ELEMENT_NODE'. The getAttribute("attribute_name") method of Element interface is used to retrieve the value of attribute in the form of a String.

Example

以下 studentData.xml 文件包含三个学生的信息。我们将使用 Java 中的 XPath 解析程序库检索此信息。class 元素是具有三个学生子元素的根元素。我们了解一下如何使用 Java 中的 XPath 库检索三个学生的信息。

The following studentData.xml file has information of three students. We are going to retrieve this information using XPath parser library in Java. The class element is the root element with three student child elements. Let us see how to use XPath library in java to retrieve the infromation of three students.

<?xml version = "1.0"?>
<class>
   <student rollno = "393">
      <firstname>dinkar</firstname>
      <lastname>kad</lastname>
      <nickname>dinkar</nickname>
      <marks>85</marks>
   </student>

   <student rollno = "493">
      <firstname>Vaneet</firstname>
      <lastname>Gupta</lastname>
      <nickname>vinni</nickname>
      <marks>95</marks>
   </student>

   <student rollno = "593">
      <firstname>jasvir</firstname>
      <lastname>singh</lastname>
      <nickname>jazz</nickname>
      <marks>90</marks>
   </student>
</class>

在以下 RetrieveAttributes.java 程序中,我们解析了 student.xml 文件并构建了一个文档。表达式 '/class/student' 用于将“class”节点中的所有“student”节点获取到一个 NodeList 中。然后迭代 NodeList,获取各个学生的信息。

In the following RetrieveAttributes.java program, we have parsed the student.xml file and built a document. The expression '/class/student' is used to get all the 'student' nodes inside the 'class' node into a NodeList. The NodeList is then iterated and got the information of each student.

import java.io.File;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;

public class RetrieveAttributes {
   public static void main(String[] args) {
      try {

         //Creating a DocumentBuilder
         DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
         DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();

         //Reading the XML
         File inputFile = new File("studentData.xml");

         //Creating Document from file or Stream
         Document doc = dBuilder.parse(inputFile);

         //Building XPath
         XPath xPath =  XPathFactory.newInstance().newXPath();

         //Preparing and Evaluating XPath expression
         String expression = "/class/student";
         NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(
            doc, XPathConstants.NODESET);

         //Iterating over NodeList
         for (int i = 0; i < nodeList.getLength(); i++) {
            Node nNode = nodeList.item(i);
            System.out.println("\nCurrent Element :" + nNode.getNodeName());
            //Retrieving Elements
            if (nNode.getNodeType() == Node.ELEMENT_NODE) {
               Element eElement = (Element) nNode;
               System.out.println("Student roll no :" + eElement.getAttribute("rollno"));
               System.out.println("First Name : "
                  + eElement
                  .getElementsByTagName("firstname")
                  .item(0)
                  .getTextContent());
               System.out.println("Last Name : "
                  + eElement
                  .getElementsByTagName("lastname")
                  .item(0)
                  .getTextContent());
               System.out.println("Nick Name : "
                  + eElement
                  .getElementsByTagName("nickname")
                  .item(0)
                  .getTextContent());
               System.out.println("Marks : "
                  + eElement
                  .getElementsByTagName("marks")
                  .item(0)
                  .getTextContent());
            }
         }
      } catch (Exception e) {
         e.printStackTrace();
      }
   }
}

Output

三个学生的所有信息均显示在控制台上。

All the information of three students is displayed on the console.

Current Element :student
Student roll no : 393
First Name : dinkar
Last Name : kad
Nick Name : dinkar
Marks : 85

Current Element :student
Student roll no : 493
First Name : Vaneet
Last Name : Gupta
Nick Name : vinni
Marks : 95

Current Element :student
Student roll no : 593
First Name : jasvir
Last Name : singh
Nick Name : jazz
Marks : 90