📜  从 XML 文档中提取内容的Java程序

📅  最后修改于: 2022-05-13 01:54:33.403000             🧑  作者: Mango

从 XML 文档中提取内容的Java程序

XML 文件在标签之间包含数据,因此与 docx 和 txt 等其他文件格式相比,读取数据比较复杂。有两种类型的解析器可以解析 XML 文件:

  • 基于对象(例如 DOM)
  • 基于事件(例如 SAX、StAX)

XML 解析器的类型

在本文中,我们将讨论如何使用Java DOM 解析器和Java SAX 解析器解析 XML。

Java DOM 解析器: DOM 代表文档对象模型。 DOM API 提供了读取和写入 XML 文件的类。 DOM 读取整个文档。它在读取中小型 XML 文件时很有用。它是一个基于树的解析器,与 SAX 相比有点慢,并且在加载到内存时占用更多空间。我们可以使用 DOM API 插入和删除节点。

我们必须按照以下过程从Java中的 XML 文件中提取数据。

  • 实例化 XML 文件:
  • 获取根节点:我们可以使用 getDocumentElement() 来获取根节点和 XML 文件的元素。
  • 获取所有节点:使用 getElementByTagName() 返回文档顺序中具有给定标签名称并包含在文档中的所有元素的 NodeList。
  • 通过文本值获取节点:我们可以使用 getElementByTextValue() 方法来通过其值搜索节点。
  • 通过属性值获取节点:我们可以将 getElementByTagName() 方法与 getAttribute() 方法一起使用。

现在让我们看一个使用Java DOM Parser 从 XML 中提取数据的示例。

创建一个.xml文件,在本例中,我们创建了 Gfg.xml

XML
  
  
      
        1  
        geek1   
        D.S.A
        online self paced
        Lifetime  
      
        
      
        2  
        geek2  
        System Design  
        online live course
        10 Lectures  
      
    
      
        3  
        geek3  
        Compitative Programming 
        online live course 
        8 weeks  
      
    
      
        4  
        geek4  
        Complete Interview Preparation 
        online self paced 
        Lifetime  
      
    


Java
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
public class GfgXmlExtractor {
    public static void main(String argv[])
    {
        try {
            // creating a constructor of file class and
            // parsing an XML file
            File file = new File(
                "F:\\geeksforgeeks_contributions\\gfg.xml");
            
            // Defines a factory API that enables
            // applications to obtain a parser that produces
            // DOM object trees from XML documents.
            DocumentBuilderFactory dbf
                = DocumentBuilderFactory.newInstance();
            
            // we are creating an object of builder to parse
            // the  xml file.
            DocumentBuilder db = dbf.newDocumentBuilder();
            Document doc = db.parse(file);
  
            /*here normalize method Puts all Text nodes in
            the full depth of the sub-tree underneath this
            Node, including attribute nodes, into a "normal"
            form where only structure separates
            Text nodes, i.e., there are neither adjacent
            Text nodes nor empty Text nodes. */
            doc.getDocumentElement().normalize();
            System.out.println(
                "Root element: "
                + doc.getDocumentElement().getNodeName());
            
            // Here nodeList contains all the nodes with
            // name geek.
            NodeList nodeList
                = doc.getElementsByTagName("geek");
            
            // Iterate through all the nodes in NodeList
            // using for loop.
            for (int i = 0; i < nodeList.getLength(); ++i) {
                Node node = nodeList.item(i);
                System.out.println("\nNode Name :"
                                   + node.getNodeName());
                if (node.getNodeType()
                    == Node.ELEMENT_NODE) {
                    Element tElement = (Element)node;
                    System.out.println(
                        "User id: "
                        + tElement
                              .getElementsByTagName("id")
                              .item(0)
                              .getTextContent());
                    System.out.println(
                        "User Name: "
                        + tElement
                              .getElementsByTagName(
                                  "username")
                              .item(0)
                              .getTextContent());
                    System.out.println(
                        "Enrolled Course: "
                        + tElement
                              .getElementsByTagName(
                                  "EnrolledCourse")
                              .item(0)
                              .getTextContent());
                    System.out.println(
                        "Mode: "
                        + tElement
                              .getElementsByTagName("mode")
                              .item(0)
                              .getTextContent());
                    System.out.println(
                        "Duration: "
                        + tElement
                              .getElementsByTagName(
                                  "duration")
                              .item(0)
                              .getTextContent());
                }
            }
        }
        
        // This exception block catches all the exception
        // raised.
        // For example if we try to access a element by a
        // TagName that is not there in the XML etc.
        catch (Exception e) {
            System.out.println(e);
        }
    }
}


Java
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class GfgSaxXmlParser {
    public static void main(String args[])
    {
        try {
            /*SAXParserFactory is  a factory API that
            enables applications to configure and obtain a
            SAX based parser to parse XML documents. */
            SAXParserFactory factory
                = SAXParserFactory.newInstance();
            
            // Creating a new instance of a SAXParser using
            // the currently configured factory parameters.
            SAXParser saxParser = factory.newSAXParser();
            
            // DefaultHandler is Default base class for SAX2
            // event handlers.
            DefaultHandler handler = new DefaultHandler() {
                boolean id = false;
                boolean username = false;
                boolean EnrolledCourse = false;
                boolean mode = false;
                boolean duration = false;
                
                // Receive notification of the start of an
                // element. parser starts parsing a element
                // inside the document
                public void startElement(
                    String uri, String localName,
                    String qName, Attributes attributes)
                    throws SAXException
                {
  
                    if (qName.equalsIgnoreCase("Id")) {
                        id = true;
                    }
                    if (qName.equalsIgnoreCase(
                            "username")) {
                        username = true;
                    }
                    if (qName.equalsIgnoreCase(
                            "EnrolledCourse")) {
                        EnrolledCourse = true;
                    }
                    if (qName.equalsIgnoreCase("mode")) {
                        mode = true;
                    }
                    if (qName.equalsIgnoreCase(
                            "duration")) {
                        duration = true;
                    }
                }
                
                // Receive notification of character data
                // inside an element, reads the text value of
                // the currently parsed element
                public void characters(char ch[], int start,
                                       int length)
                    throws SAXException
                {
                    if (id) {
                        System.out.println(
                            "ID : "
                            + new String(ch, start,
                                         length));
                        id = false;
                    }
                    if (username) {
                        System.out.println(
                            "User Name: "
                            + new String(ch, start,
                                         length));
                        username = false;
                    }
                    if (EnrolledCourse) {
                        System.out.println(
                            "Enrolled Course: "
                            + new String(ch, start,
                                         length));
                        EnrolledCourse = false;
                    }
                    if (mode) {
                        System.out.println(
                            "mode: "
                            + new String(ch, start,
                                         length));
                        mode = false;
                    }
                    if (duration) {
                        System.out.println(
                            "duration : "
                            + new String(ch, start,
                                         length));
                        duration = false;
                    }
                }
            };
            
            /*Parse the content described by the giving
             Uniform Resource
             Identifier (URI) as XML using the specified
             DefaultHandler. */
            saxParser.parse(
                "F:\\geeksforgeeks_contributions\\gfg.xml",
                handler);
        }
        catch (Exception e) {
            System.out.println(e);
        }
    }
}


现在为Java DOM 解析器创建一个Java文件。在这种情况下,GfgXmlExtractor。Java

Java

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
public class GfgXmlExtractor {
    public static void main(String argv[])
    {
        try {
            // creating a constructor of file class and
            // parsing an XML file
            File file = new File(
                "F:\\geeksforgeeks_contributions\\gfg.xml");
            
            // Defines a factory API that enables
            // applications to obtain a parser that produces
            // DOM object trees from XML documents.
            DocumentBuilderFactory dbf
                = DocumentBuilderFactory.newInstance();
            
            // we are creating an object of builder to parse
            // the  xml file.
            DocumentBuilder db = dbf.newDocumentBuilder();
            Document doc = db.parse(file);
  
            /*here normalize method Puts all Text nodes in
            the full depth of the sub-tree underneath this
            Node, including attribute nodes, into a "normal"
            form where only structure separates
            Text nodes, i.e., there are neither adjacent
            Text nodes nor empty Text nodes. */
            doc.getDocumentElement().normalize();
            System.out.println(
                "Root element: "
                + doc.getDocumentElement().getNodeName());
            
            // Here nodeList contains all the nodes with
            // name geek.
            NodeList nodeList
                = doc.getElementsByTagName("geek");
            
            // Iterate through all the nodes in NodeList
            // using for loop.
            for (int i = 0; i < nodeList.getLength(); ++i) {
                Node node = nodeList.item(i);
                System.out.println("\nNode Name :"
                                   + node.getNodeName());
                if (node.getNodeType()
                    == Node.ELEMENT_NODE) {
                    Element tElement = (Element)node;
                    System.out.println(
                        "User id: "
                        + tElement
                              .getElementsByTagName("id")
                              .item(0)
                              .getTextContent());
                    System.out.println(
                        "User Name: "
                        + tElement
                              .getElementsByTagName(
                                  "username")
                              .item(0)
                              .getTextContent());
                    System.out.println(
                        "Enrolled Course: "
                        + tElement
                              .getElementsByTagName(
                                  "EnrolledCourse")
                              .item(0)
                              .getTextContent());
                    System.out.println(
                        "Mode: "
                        + tElement
                              .getElementsByTagName("mode")
                              .item(0)
                              .getTextContent());
                    System.out.println(
                        "Duration: "
                        + tElement
                              .getElementsByTagName(
                                  "duration")
                              .item(0)
                              .getTextContent());
                }
            }
        }
        
        // This exception block catches all the exception
        // raised.
        // For example if we try to access a element by a
        // TagName that is not there in the XML etc.
        catch (Exception e) {
            System.out.println(e);
        }
    }
}

输出

Root element: class

Node Name :geek
User id: 1
User Name: geek1
Enrolled Course: D.S.A
Mode: online self paced
Duration: Lifetime

Node Name :geek
User id: 2
User Name: geek2
Enrolled Course: System Design
Mode: online live course
Duration: 10 Lectures

Node Name :geek
User id: 3
User Name: geek3
Enrolled Course: Compitative Programming
Mode: online live course
Duration: 8 weeks

Node Name :geek
User id: 4
User Name: geek4
Enrolled Course: Complete Interview Preparation
Mode: online self paced
Duration: Lifetime




方法 2 : Java SAX 解析器

Java中的 SAX Parser 提供 API 来解析 XML 文档。 SAX 解析器与 DOM 解析器有很大不同,因为它不会将完整的 XML 加载到内存中并按顺序读取 XML 文档。在 SAX 中,解析由ContentHandler接口完成,该接口由DefaultHandler类实现。

现在让我们看一个使用Java SAX Parser 从 XML 中提取数据的示例。

为 SAX 解析器创建一个Java文件。在本例中,我们创建了 GfgSaxXmlExtractor。Java

Java

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class GfgSaxXmlParser {
    public static void main(String args[])
    {
        try {
            /*SAXParserFactory is  a factory API that
            enables applications to configure and obtain a
            SAX based parser to parse XML documents. */
            SAXParserFactory factory
                = SAXParserFactory.newInstance();
            
            // Creating a new instance of a SAXParser using
            // the currently configured factory parameters.
            SAXParser saxParser = factory.newSAXParser();
            
            // DefaultHandler is Default base class for SAX2
            // event handlers.
            DefaultHandler handler = new DefaultHandler() {
                boolean id = false;
                boolean username = false;
                boolean EnrolledCourse = false;
                boolean mode = false;
                boolean duration = false;
                
                // Receive notification of the start of an
                // element. parser starts parsing a element
                // inside the document
                public void startElement(
                    String uri, String localName,
                    String qName, Attributes attributes)
                    throws SAXException
                {
  
                    if (qName.equalsIgnoreCase("Id")) {
                        id = true;
                    }
                    if (qName.equalsIgnoreCase(
                            "username")) {
                        username = true;
                    }
                    if (qName.equalsIgnoreCase(
                            "EnrolledCourse")) {
                        EnrolledCourse = true;
                    }
                    if (qName.equalsIgnoreCase("mode")) {
                        mode = true;
                    }
                    if (qName.equalsIgnoreCase(
                            "duration")) {
                        duration = true;
                    }
                }
                
                // Receive notification of character data
                // inside an element, reads the text value of
                // the currently parsed element
                public void characters(char ch[], int start,
                                       int length)
                    throws SAXException
                {
                    if (id) {
                        System.out.println(
                            "ID : "
                            + new String(ch, start,
                                         length));
                        id = false;
                    }
                    if (username) {
                        System.out.println(
                            "User Name: "
                            + new String(ch, start,
                                         length));
                        username = false;
                    }
                    if (EnrolledCourse) {
                        System.out.println(
                            "Enrolled Course: "
                            + new String(ch, start,
                                         length));
                        EnrolledCourse = false;
                    }
                    if (mode) {
                        System.out.println(
                            "mode: "
                            + new String(ch, start,
                                         length));
                        mode = false;
                    }
                    if (duration) {
                        System.out.println(
                            "duration : "
                            + new String(ch, start,
                                         length));
                        duration = false;
                    }
                }
            };
            
            /*Parse the content described by the giving
             Uniform Resource
             Identifier (URI) as XML using the specified
             DefaultHandler. */
            saxParser.parse(
                "F:\\geeksforgeeks_contributions\\gfg.xml",
                handler);
        }
        catch (Exception e) {
            System.out.println(e);
        }
    }
}

输出

ID : 1
User Name: geek1
Enrolled Course: D.S.A
mode: online self paced
duration : Lifetime
ID : 2
User Name: geek2
Enrolled Course: System Design
mode: online live course
duration : 10 Lectures
ID : 3
User Name: geek3
Enrolled Course: Compitative Programming
mode: online live course
duration : 8 weeks
ID : 4
User Name: geek4
Enrolled Course: Complete Interview Preparation
mode: online self paced
duration : Lifetime