Java Xml 简明教程

Java SAX Parser - Parse XML Document

Java SAX(XML 简单 API)解析器是 Java 中的一个 API,用于解析 XML 文档。SAX 解析器是一个基于事件的解析器,并使用 Handler 类来处理事件。在 Handler 类中实现回调方法(例如 startElement()、characters()、endElement() 等)以获取元素及其属性的详细信息。当解析器标识各个事件时,便会调用这些回调方法。

Java SAX(Simple API for XML) parser is an API in Java to parse XML documents. SAX parser is an event based parser and uses a Handler class to handle the events. The call back methods such as startElement(), characters(), endElement() etc., are implemented inside the Handler class to obtain the details of elements and their attributes. These call back methods are called when the parser identifies the respective events.

Parse XML Using Java SAX parser

以下是使用 SAX 解析器在 Java 中解析 XML 文档时需要的步骤−

Following are the steps we need to follow to parse an XML document in Java using SAX parser −

  1. *Step 1: *Implementing a Handler class

  2. *Step 2: *Creating a SAXParser Object

  3. *Step 3: *Reading the XML

  4. *Step 4: *Creating object for Handler class

  5. *Step 5: *Parsing the XML Document

  6. *Step 6: *Retrieving the Elements

Step 1: Implementing a Handler class

应用程序必须实现一个处理程序类来处理 XML 文档中的事件。在实现 Handler 类后,必须使用 SAX 解析器注册该类。

Application program must implement a handler class to handle the events inside the XML document. After implementing the Handler class, it must be registered with the SAX parser.

如前一章所述,DefaultHandler 类实现了 ContentHandler 界面。它具有以下方法:startDocument()、endDocument()、startElement()、endElement() 和 characters() 函数,可帮助我们解析 XML 文档。我们根据需要在这些方法中编写代码。

As discussed in the previous chapter, the DefaultHandler class implements ContentHandler interface. It has the methods, startDocument(), endDocument(), startElement(), endElement() and characters() functions that help us parse the XML documents. We write the code inside these methods according to our requirement.

class UserHandler extends DefaultHandler {

   public void startDocument() {
      ...
   }

   public void startElement(String uri, String localName, String qName, Attributes attributes) {
      ...
   }

   public void characters(char[] ch, int start, int length) {
      ...
   }

   public void endElement(String uri, String localName, String qName) {
      ...
   }

   public void endDocument() {
      ...
   }
}

Step 2: Creating a SAXParser Object

SAXParserFactory 类用于创建新的工厂实例,然后使用该工厂实例创建 SAXParser 对象,如下所示:

The SAXParserFactory class is used to create a new factory instance which in turn is used to create the SAXParser object as follows −

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();

Step 3: Reading the XML

通过指定正确的文件路径来读取 XML 文件,如下所示:

Read the XML file by specifying the proper file path as follows −

File xmlFile = new File("input.xml");

我们可以创建 XML 内容的 InputStream,而不是读取文件,如下所示:

Instead of reading files, we can create an InputStream of the XML content as follows −

StringBuilder xmlBuilder = new StringBuilder();
xmlBuilder.append(""<?xml version="1.0"?> <rootElement></rootElement>"");
ByteArrayInputStream inputStream = new ByteArrayInputStream( xmlBuilder.toString().getBytes("UTF-8"));

Step 4: Creating object for Handler class

在第一步中为已实现的 UserHandler 类创建一个对象,如下所示:

Create an object for the already implemented UserHandler class in first step as follows −

UserHandler userHandler = new UserHandler();

Step 5: Parsing the XML Document

SAXParser 类具有 parse() 方法,该方法采用两个参数,一个为文件,另一个为 DefaultHandler 对象。此函数使用在 DefaultHandler 类中实现的函数将给定的文件解析为 XML 文档。

The SAXParser class has the parse() method that takes two arguments, one is the file and the other is the DefaultHandler object. This function parses the given file as XML document using the functions implemented inside the DefaultHandler class.

saxParser.parse(xmlFile, userHandler);

SAXParser 类还具有一个作为 InputStream 接受内容的 parse()函数 -

The SAXParser class also has the function parse() that takes the content as InputStream −

saxParser.parse(inputStream, userHandler);

Step 6: Retrieving the Elements

遵循上述五个步骤后,我们现在可以轻松地检索有关元素所需的的信息。我们应该在第一步中将所需的代码写在我们的 Handler 类的函数中。ContentHandler 接口中的所有可用函数上章已经讨论,在本节中,我们将实现这些函数以检索元素的基本信息,例如元素名称、文本内容和属性。

After following the above five steps, we can now easily retrieve the required information about the elements. We should write the required code inside the methods of our Handler class in first step. All the methods available inside the ContentHandler interface are discussed in the previous chapter and in this chapter, we will implement these methods to retrieve the basic information about elements such as element name, text content and attributes.

Retrieving Element Name

元素名称可以从 ContentHandler 接口的 startElement() 函数中获得。此函数的第三个参数是元素的名称,它为 String 类型。我们可以在 Handler 类中实现此函数并获取元素的名称。

Element name can be obtained from the startElement() method of ContentHandler interface. The third argument of this method is the name of the Element and it is of String type. We can implement this method in our Handler class and get the name of an Element.

Example

在下面的示例中,我们使用 StringBuilder 类以字符串形式采用 XML 内容,并使用 ByteArrayInputStream 转换成字节。

In the following example, we have taken XML content in the form of a String using StringBuilder class and converted into bytes using ByteArrayInputStream.

在 UserHandler 类中,我们已经实现了 startElement() 函数并运算了元素的名称。因为 XML 内容中只有一个元素,因此它成为文档的根元素。

In the UserHandler class, we have implemented the startElement() method and printed the name of the Element. Since, we have only single element in the XML content, that becomes the root element of the document.

import java.io.ByteArrayInputStream;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

//Implementing UserHandler Class
class UserHandler extends DefaultHandler{
   public void startElement(String uri, String localName, String qName, Attributes attributes)
	  throws SAXException {
	  System.out.println("Root element is "+qName);
   }
}

public class RetrieveElementName {
   public static void main(String args[]) {
      try {

    	 //Creating a SAXParser Object
    	 SAXParserFactory factory = SAXParserFactory.newInstance();
         SAXParser saxParser = factory.newSAXParser();

         //Reading the XML
         StringBuilder xmlBuilder = new StringBuilder();
   	     xmlBuilder.append("<college>XYZ College</college>");
   	     ByteArrayInputStream input = new ByteArrayInputStream(xmlBuilder.toString().getBytes("UTF-8"));

   	     //Creating UserHandler object
   	     UserHandler userhandler = new UserHandler();

   	     //Parsing the XML Document
   	     saxParser.parse(input, userhandler);

      }  catch (Exception e) {
    	 e.printStackTrace();
      }
   }
}

根元素名称、“college”被运算在输出屏幕上。

Root Element name, 'college' is printed on the output screen.

Root element is college

Retrieving TextContent

要检索元素的文本内容,我们有 ContentHandler 接口中的 characters() 函数。此函数有字符数组、开始位置和长度参数。在解析器看到“>”符号之后的内容时,将调用此函数。start 参数携带“>”符号后的第一个字符的索引,length 则包含在遇到“<”符号之前字符的数量。

To retrieve text content of an element, we have characters() method in ContentHandler interface. There is character array, start and length arguments in this method. As soon as the parser sees the content after ">" symbol, this method is called. The start argument carries the index of the first character after ">" symbol and length has the number of characters before it encounters "<" symbol.

Example

下面的 college.xml 文件有一个单一子元素“department”,文本内容为“Computer Science”。让我们编写一个 Java 程序,以使用 SAX API 检索此文本内容以及元素名称。

The following college.xml file has a single sub element, "department" with text content "Computer Science". Let us write a Java program to retrieve this text content along with element names using SAX API.

<college>
   <department>Computer Science</department>
</college>

UserHandler 类继承 DefaultHandler,并且我们已经实现了 startElement()、endElement() 和 characters() 函数。当解析器看到 department 元素内的文本内容时,将调用此函数,并且我们将其运算在控制台上。

The UserHandler class inherits DefaultHandler and we have implemented startElement(), endElement() and characters() method. When the parser sees the text content inside department element, this method is called and we are printing it on the console.

import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

//Implementing UserHandler Class
class UserHandler extends DefaultHandler {
   public void startElement( String uri, String localName, String qName, Attributes attributes)
      throws SAXException {
      System.out.println("Start Element : " + qName);
   }

   public void endElement(String uri, String localName, String qName) {
      System.out.println("End Element : " + qName);
   }
   public void characters(char[] ch, int start, int length) throws SAXException{
      System.out.println("Text Content : " + new String(ch, start, length));
   }
}
public class RetrieveTextContent {
	public static void main(String args[]) {
	   try {

          //Creating a SAXParser Object
          SAXParserFactory factory = SAXParserFactory.newInstance();
          SAXParser saxParser = factory.newSAXParser();

          //Reading the XML
          File xmlFile = new File("college.xml");

          //Creating UserHandler object
          UserHandler userHandler = new UserHandler();

          //Parsing the XML Document
          saxParser.parse(xmlFile, userHandler);

	   } catch(Exception e) {
          e.printStackTrace();
	   }
	}
}

运算了 department 元素的文本内容。因为“college”元素内没有文本内容,所以它保持空白。

The text content for department element is displayed. As there is no text content inside the "college" element, it is left blank.

Start Element : college
Text Content :

Start Element : department
Text Content : Computer Science
End Element : department
Text Content :

End Element : college

Retrieving Attributes

函数 startElement() 的最后一个参数是 Attributes ,其中包含当前元素内的属性列表。Attributes 接口内的 getValue("attr_name") 函数用于获取指定属性的值。

The method startElement() has Attributes as last argument which has the list of attributes inside the current Element. The getValue("attr_name") function inside the Attributes interface is used to get the value of the specified attribute.

Example

我们向 “college.xml” 文件添加了更多 department 元素,并在每个系中添加了一个属性“deptcode”。让我们编写一个 java 程序以检索所有元素及其属性。

We have added few more department elements to our "college.xml" file and also added an attribute "deptcode" to each of the departments. Let us write a java program to retrieve all the elements along with their attributes.

<?xml version = "1.0"?>
<college>
   <department deptcode = "DEP_CS23">
      <name>Computer Science</name>
      <staffCount>20</staffCount>
   </department>
   <department deptcode = "DEP_EC34">
      <name>Electrical and Electronics</name>
      <staffCount>23</staffCount>
   </department>
   <department deptcode = "DEP_MC89">
      <name>Mechanical</name>
      <staffCount>15</staffCount>
   </department>
</college>

下面的 Java 程序在 UserHandler 类中实现了 startElement() 和 characters() 函数。我们已经初始化了两个布尔变量以让我们知晓 department 元素中的 deptcode 和 staffCount 属性,以便我们能够使用这些属性在 characters() 函数中运算属性。

The following Java program implements startElement() and characters() methods in UserHandler class. We have initialised two boolean variables to let us notified about deptcode and staffCount attributes in department element, so that we can use these to print the attributes in characters() method.

import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

//Implementing UserHandler Class
class UserHandler extends DefaultHandler{

   boolean hasDeptName=false;
   boolean hasStaffCount=false;

   public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException{

      if(qName.equals("college")) {
         System.out.println("Root Element : "+qName + "\n");
      }
      if(qName.equals("department")) {
         System.out.println("Current Element : "+qName);
		 System.out.println("Department code : "+ attributes.getValue("deptcode"));
	  }
	  if(qName.equals("name")) {
         hasDeptName=true;
      }
	  if(qName.equals("staffCount")) {
		 hasStaffCount=true;
	  }
   }

   public void characters(char[] ch, int start, int length) throws SAXException{

      if(hasDeptName) {
         System.out.println("Department Name : "+ new String(ch, start, length));
		 hasDeptName=false;
      }
	  if(hasStaffCount) {
         System.out.println("Staff Count : "+ new String(ch, start, length) + "\n");
         hasStaffCount=false;
      }
   }
}

public class ParseAttributesSAX {
   public static void main(String args[]) {
      try {

	     //Creating a DocumentBuilder Object
	     SAXParserFactory factory = SAXParserFactory.newInstance();
	     SAXParser saxParser = factory.newSAXParser();

	     //Reading the XML
	     File xmlFile = new File("college.xml");

	     //Creating UserHandler object
	     UserHandler userHandler = new UserHandler();

	     //Parsing the XML Document
	     saxParser.parse(xmlFile, userHandler);

	  } catch(Exception e) {
	          e.printStackTrace();
      }
   }
}

输出窗口运算了每个元素及其属性的名称。

The ouput window displays names of each element along with the attributes.

Root Element : college

Current Element : department
Department code : DEP_CS23
Department Name : Computer Science
Staff Count : 20

Current Element : department
Department code : DEP_EC34
Department Name : Electrical and Electronics
Staff Count : 23

Current Element : department
Department code : DEP_MC89
Department Name : Mechanical
Staff Count : 15