Tuesday, March 4, 2014

Java XML parsing. DOM and SAX parsers.

Java XML parsing. DOM and SAX. 

As everyone know, the most popular ways for parsing XML files are following:

- DOM(The Document Object Model) is a cross-platform and language-independent convention for representing and interacting with objects in HTML,XHTML and XML documents.[1] Objects in the DOM tree may be addressed and manipulated by using methods on the objects. The public interface of a DOM is specified in its application programming interface (API)
Wiki:     http://en.wikipedia.org/wiki/Document_Object_Model

- SAX (Simple API for XML) is an event-based sequential access parser API developed by the XML-DEV mailing list for XML documents.[1] SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole, SAX parsers operate on each piece of the XML document sequentially. 
Wiki: http://en.wikipedia.org/wiki/SAX

Detailed description can be found in Wikipedia, but in "two words" differences are following :

- DOM parser builds by XML file hierarchy tree. After that, tree can be accessed by  XPATH "select" statements.   
- SAX parser process XML file in "step-by-step" way: moving from one node to next node and produce events, such as "Start Element Found".

In this post showed example of parsing by SAX and DOM parsers XML file with weather data. Source data is here : http://informer.gismeteo.ru/xml/27612_1.xml - it's the weather data for Kiev city.

So, Xml data :

<?xml version="1.0" encoding="utf-8"?>
<MMWEATHER>
 <REPORT type="frc3">
  <TOWN index="27612" sname="%CC%EE%F1%EA%E2%E0" latitude="55" longitude="37">
   <FORECAST day="05" month="03" year="2014" hour="04" tod="0" predict="0" weekday="4">
    <PHENOMENA cloudiness="3" precipitation="10" rpower="0" spower="0"/>
    <PRESSURE max="753" min="751"/>
    <TEMPERATURE max="-3" min="-1"/>
    <WIND min="0" max="2" direction="1"/>
    <RELWET max="91" min="89"/>
    <HEAT min="-3" max="-1"/>
   </FORECAST>
   <FORECAST day="05" month="03" year="2014" hour="10" tod="1" predict="0" weekday="4">
    <PHENOMENA cloudiness="3" precipitation="10" rpower="0" spower="0"/>
    <PRESSURE max="754" min="752"/>
    <TEMPERATURE max="-2" min="0"/>
    <WIND min="1" max="3" direction="1"/>
    <RELWET max="87" min="85"/>
    <HEAT min="-3" max="-1"/>
   </FORECAST>
   <FORECAST day="05" month="03" year="2014" hour="16" tod="2" predict="0" weekday="4">
    <PHENOMENA cloudiness="3" precipitation="10" rpower="0" spower="0"/>
    <PRESSURE max="756" min="754"/>
    <TEMPERATURE max="5" min="3"/>
    <WIND min="0" max="2" direction="1"/>
    <RELWET max="76" min="74"/>
    <HEAT min="3" max="5"/>
   </FORECAST>
   <FORECAST day="05" month="03" year="2014" hour="22" tod="3" predict="0" weekday="4">
    <PHENOMENA cloudiness="2" precipitation="10" rpower="0" spower="0"/>
    <PRESSURE max="757" min="755"/>
    <TEMPERATURE max="1" min="-1"/>
    <WIND min="1" max="3" direction="1"/>
    <RELWET max="92" min="90"/>
    <HEAT min="-2" max="0"/>
   </FORECAST>
  </TOWN>
 </REPORT>
</MMWEATHER>

As you can see, it's regular data from weather informer. Let's parse it to convert it to more readable view.

1. SAX parser

In SAX parser core element is EVENT.

Creating eventReader
eventReader = inputFactory.createXMLEventReader(is);

Get next event 
event = eventReader.nextEvent();

Check if event was about starting new element and process element attributes 
if (event.isStartElement()) {
       StartElement startElement = event.asStartElement();
         if (startElement.getName().getLocalPart() == (TOWN)) {
..... processing town element 
if (startElement.getName().getLocalPart() == (FORECAST)) {
......processing forecast element


Full code of Sax parser:
package com.demien.xpath.parser;

import com.demien.xpath.model.DayReportVO;
import com.demien.xpath.model.TownReportVO;

import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.util.Iterator;


public class SaxParser implements IParser {
    private String TOWN = "TOWN";
    private String FORECAST = "FORECAST";
    private String PHENOMENA = "PHENOMENA";
    private String PRESSURE = "PRESSURE";
    private String TEMPERATURE = "TEMPERATURE";
    private String WIND = "WIND";
    private String RELWET = "RELWET";
    private String HEAT = "HEAT";


    private TownReportVO townReport = null;
    private DayReportVO dayReport = null;

    private DayReportVO DAY_TEMPLATE;
    private TownReportVO TOWN_TEMPLATE;

    public SaxParser() {
        DAY_TEMPLATE = DayReportVO.getTemplate();
        TOWN_TEMPLATE = TownReportVO.getTemplate();
    }

    public TownReportVO parse(String data) {
        XMLInputFactory inputFactory = XMLInputFactory.newInstance();
        InputStream is = new ByteArrayInputStream(data.getBytes());
        XMLEventReader eventReader = null;
        XMLEvent event;

        try {
            eventReader = inputFactory.createXMLEventReader(is);
        } catch (Exception e) {
            e.printStackTrace();
        }

        if (eventReader != null) {
            while (eventReader.hasNext()) {
                event = null;
                try {
                    event = eventReader.nextEvent();
                } catch (Exception e) {
                    e.printStackTrace();
                }
                if (event != null) {
                    if (event.isStartElement()) {
                        StartElement startElement = event.asStartElement();
                        //----------------------------------------------------
                        //-----    TOWN --------------------------------------
                        //-----------------------------------------------------
                        if (startElement.getName().getLocalPart() == (TOWN)) {
                            if (townReport == null) townReport = new TownReportVO();
                            Iterator<Attribute> attributes = startElement
                                    .getAttributes();
                            while (attributes.hasNext()) {
                                Attribute attribute = attributes.next();
                                if (attribute.getName().toString().equals(TOWN_TEMPLATE.getIndex())) {
                                    townReport.setIndex(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(TOWN_TEMPLATE.getSname())) {
                                    townReport.setSname(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(TOWN_TEMPLATE.getLatitude())) {
                                    townReport.setLatitude(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(TOWN_TEMPLATE.getLongitude())) {
                                    townReport.setLongitude(attribute.getValue());
                                }
                            } // while (attributes.hasNext()) {
                        } // if (startElement.getName().getLocalPart() == (TOWN)) {
                        //--------------------------------------------------------
                        //-----    FORECAST --------------------------------------
                        //--------------------------------------------------------
                        if (startElement.getName().getLocalPart() == (FORECAST)) {
                            if (dayReport == null) {
                                dayReport = new DayReportVO();
                            } else { // new element
                                if (townReport == null) townReport = new TownReportVO();
                                townReport.addDayReport(dayReport);
                                dayReport = new DayReportVO();
                            }
                            ;
                            Iterator<Attribute> attributes = startElement
                                    .getAttributes();
                            while (attributes.hasNext()) {
                                Attribute attribute = attributes.next();
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getForecastDay())) {
                                    dayReport.setForecastDay(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getForecastMonth())) {
                                    dayReport.setForecastMonth(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getForecastYear())) {
                                    dayReport.setForecastYear(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getForecastHour())) {
                                    dayReport.setForecastHour(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getForecastTod())) {
                                    dayReport.setForecastTod(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getForecastWeekday())) {
                                    dayReport.setForecastWeekday(attribute.getValue());
                                }
                            } // while (attributes.hasNext()) {
                        } // if (startElement.getName().getLocalPart() == (FORECAST)) {

                        //-----------------------------------------------------------------------------------
                        //---- PHENOMENA --------------------------------------------------------------------
                        //-----------------------------------------------------------------------------------
                        if (startElement.getName().getLocalPart() == (PHENOMENA)) {
                            if (dayReport == null) {
                                dayReport = new DayReportVO();
                            }
                            ;
                            Iterator<Attribute> attributes = startElement
                                    .getAttributes();
                            while (attributes.hasNext()) {
                                Attribute attribute = attributes.next();
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getPhenomenaCloudiness())) {
                                    dayReport.setPhenomenaCloudiness(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getPhenomenaPrecipitation())) {
                                    dayReport.setPhenomenaPrecipitation(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getPhenomenaRpower())) {
                                    dayReport.setPhenomenaRpower(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getPhenomenaSpower())) {
                                    dayReport.setPhenomenaSpower(attribute.getValue());
                                }
                            } // while (attributes.hasNext()) {
                        } // if (startElement.getName().getLocalPart() == (PHENOMENA)) {

                        //-----------------------------------------------------------------------------------
                        //---- PRESSURE --------------------------------------------------------------------
                        //-----------------------------------------------------------------------------------
                        if (startElement.getName().getLocalPart() == (PRESSURE)) {
                            if (dayReport == null) {
                                dayReport = new DayReportVO();
                            }
                            ;
                            Iterator<Attribute> attributes = startElement
                                    .getAttributes();
                            while (attributes.hasNext()) {
                                Attribute attribute = attributes.next();
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getPressureMax())) {
                                    dayReport.setPressureMax(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getPressureMin())) {
                                    dayReport.setPressureMin(attribute.getValue());
                                }
                            } // while (attributes.hasNext()) {
                        } // if (startElement.getName().getLocalPart() == (PRESSURE)) {

                        //-----------------------------------------------------------------------------------
                        //---- TEMPERATURE --------------------------------------------------------------------
                        //-----------------------------------------------------------------------------------
                        if (startElement.getName().getLocalPart() == (TEMPERATURE)) {
                            if (dayReport == null) {
                                dayReport = new DayReportVO();
                            }
                            ;
                            Iterator<Attribute> attributes = startElement
                                    .getAttributes();
                            while (attributes.hasNext()) {
                                Attribute attribute = attributes.next();
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getTemperatureMax())) {
                                    dayReport.setTemperatureMax(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getTemperatureMin())) {
                                    dayReport.setTemperatureMin(attribute.getValue());
                                }
                            } // while (attributes.hasNext()) {
                        } // if (startElement.getName().getLocalPart() == (TEMPERATURE)) {

                        //-----------------------------------------------------------------------------------
                        //---- WIND --------------------------------------------------------------------
                        //-----------------------------------------------------------------------------------
                        if (startElement.getName().getLocalPart() == (WIND)) {
                            if (dayReport == null) {
                                dayReport = new DayReportVO();
                            }
                            ;
                            Iterator<Attribute> attributes = startElement
                                    .getAttributes();
                            while (attributes.hasNext()) {
                                Attribute attribute = attributes.next();
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getWindMax())) {
                                    dayReport.setWindMax(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getWindMin())) {
                                    dayReport.setWindMin(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getWindDirection())) {
                                    dayReport.setWindDirection(attribute.getValue());
                                }
                            } // while (attributes.hasNext()) {
                        } // if (startElement.getName().getLocalPart() == (WIND)) {

                        //-----------------------------------------------------------------------------------
                        //---- RELWET --------------------------------------------------------------------
                        //-----------------------------------------------------------------------------------
                        if (startElement.getName().getLocalPart() == (RELWET)) {
                            if (dayReport == null) {
                                dayReport = new DayReportVO();
                            }
                            ;
                            Iterator<Attribute> attributes = startElement
                                    .getAttributes();
                            while (attributes.hasNext()) {
                                Attribute attribute = attributes.next();
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getRelwetMax())) {
                                    dayReport.setRelwetMax(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getRelwetMin())) {
                                    dayReport.setRelwetMin(attribute.getValue());
                                }
                            } // while (attributes.hasNext()) {
                        } // if (startElement.getName().getLocalPart() == (RELWET)) {

                        //-----------------------------------------------------------------------------------
                        //---- HEAT --------------------------------------------------------------------
                        //-----------------------------------------------------------------------------------
                        if (startElement.getName().getLocalPart() == (HEAT)) {
                            if (dayReport == null) {
                                dayReport = new DayReportVO();
                            }

                            Iterator<Attribute> attributes = startElement
                                    .getAttributes();
                            while (attributes.hasNext()) {
                                Attribute attribute = attributes.next();
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getHeatMax())) {
                                    dayReport.setHeatMax(attribute.getValue());
                                }
                                if (attribute.getName().toString().equals(DAY_TEMPLATE.getHeatMin())) {
                                    dayReport.setHeatMin(attribute.getValue());
                                }
                            } // while (attributes.hasNext()) {
                        } // if (startElement.getName().getLocalPart() == (HEAT)) {


                    } // if (event.isStartElement()) {
                } //if (event!=null) {

            } // while
            // add final day report
            townReport.addDayReport(dayReport);
        } // if
        //System.out.println(townReport.toString());
        return townReport;
    }

}


2. DOM parser

DOM parser works in another different way. To get data using this parser you don't need to iterate throught entire file, because parser already did that operation and built node tree. So, you can just get information you need by it path(using Xpath statements).

Get town node:
Node townNode = (Node) xpath.evaluate("/MMWEATHER/REPORT/TOWN", document, XPathConstants.NODE);

Also you can get elements only by tag name:
NodeList nlForecast = document.getElementsByTagName(FORECAST);

Processing child nodes:
for (int i = 0; i < nlForecast.getLength(); i++) {

Full code of DOM parser:
package com.demien.xpath.parser;

import com.demien.xpath.model.DayReportVO;
import com.demien.xpath.model.TownReportVO;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import java.io.StringReader;

public class DomParser implements IParser {

    private String TOWN_PATH = "/MMWEATHER/REPORT/TOWN";
    private String FORECAST = "FORECAST";

    private TownReportVO townReport = null;
    private DayReportVO dayReport = null;

    private DayReportVO DAY_TEMPLATE;
    private TownReportVO TOWN_TEMPLATE;

    // constructor
    public DomParser() {
        DAY_TEMPLATE = DayReportVO.getTemplate();
        TOWN_TEMPLATE = TownReportVO.getTemplate();
    }

    // implementation
    private String getNodeAttribute(NamedNodeMap attributes, String attribute) throws Exception{
        if (attributes==null) throw new Exception("Attribute map is null(key="+attribute+")");
        if (attribute==null) throw new Exception("Attribute key is null");
        Node node = attributes.getNamedItem(attribute);
        return node.getNodeValue();
    }

    public TownReportVO parse(String data) {
        try {
            DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
            builderFactory.setNamespaceAware(true);
            DocumentBuilder builder = builderFactory.newDocumentBuilder();
            InputSource source = new InputSource(new StringReader(data));
            Document document=builder.parse(source);

            townReport = new TownReportVO();

            XPath xpath = XPathFactory.newInstance().newXPath();

            //-----------------------
            //--  get town data
            //------------------------
            String expression = TOWN_PATH;
            Node townNode = (Node) xpath.evaluate(expression, document, XPathConstants.NODE);
            NamedNodeMap attributes = townNode.getAttributes();
            townReport.setIndex(getNodeAttribute(attributes, TOWN_TEMPLATE.getIndex()));
            townReport.setSname(getNodeAttribute(attributes, TOWN_TEMPLATE.getSname()));


            // ------------------------
            //-- get days data
            //-------------------------
            NodeList nlForecast = document.getElementsByTagName(FORECAST);
            for (int i = 0; i < nlForecast.getLength(); i++) {
                dayReport = new DayReportVO();
                Node forecastNode = nlForecast.item(i);
                NamedNodeMap forecastAttr = forecastNode.getAttributes();
                dayReport.setForecastDay(getNodeAttribute(forecastAttr, DAY_TEMPLATE.getForecastDay()));
                dayReport.setForecastHour(getNodeAttribute(forecastAttr, DAY_TEMPLATE.getForecastHour()));
                dayReport.setForecastMonth(getNodeAttribute(forecastAttr, DAY_TEMPLATE.getForecastMonth()));
                dayReport.setForecastYear(getNodeAttribute(forecastAttr, DAY_TEMPLATE.getForecastYear()));
                dayReport.setForecastWeekday(getNodeAttribute(forecastAttr, DAY_TEMPLATE.getForecastWeekday()));

                NodeList dayItems= forecastNode.getChildNodes();

                // phenomena
                Node phenomenaNode=dayItems.item(1);
                NamedNodeMap phenomenaAttr = phenomenaNode.getAttributes();
                dayReport.setPhenomenaCloudiness(getNodeAttribute(phenomenaAttr,DAY_TEMPLATE.getPhenomenaCloudiness()));
                dayReport.setPhenomenaPrecipitation(getNodeAttribute(phenomenaAttr,DAY_TEMPLATE.getPhenomenaPrecipitation()));
                dayReport.setPhenomenaRpower(getNodeAttribute(phenomenaAttr,DAY_TEMPLATE.getPhenomenaRpower()));
                dayReport.setPhenomenaSpower(getNodeAttribute(phenomenaAttr,DAY_TEMPLATE.getPhenomenaSpower()));

                // pressure
                Node pressureNode=phenomenaNode.getNextSibling();
                pressureNode=pressureNode.getNextSibling();
                NamedNodeMap pressureAttr = pressureNode.getAttributes();
                dayReport.setPressureMin(getNodeAttribute(pressureAttr,DAY_TEMPLATE.getPressureMin()));
                dayReport.setPressureMax(getNodeAttribute(pressureAttr,DAY_TEMPLATE.getPressureMax()));

                //temperature
                Node temperatureNode=pressureNode.getNextSibling();
                temperatureNode=temperatureNode.getNextSibling();
                NamedNodeMap temperatureAttr=temperatureNode.getAttributes();
                dayReport.setTemperatureMax(getNodeAttribute(temperatureAttr,DAY_TEMPLATE.getTemperatureMax()));
                dayReport.setTemperatureMin(getNodeAttribute(temperatureAttr,DAY_TEMPLATE.getTemperatureMin()));

                // wind
                Node windNode=temperatureNode.getNextSibling();
                windNode=windNode.getNextSibling();
                NamedNodeMap windAttr=windNode.getAttributes();
                dayReport.setWindMax(getNodeAttribute(temperatureAttr,DAY_TEMPLATE.getWindMax()));
                dayReport.setWindMin(getNodeAttribute(temperatureAttr,DAY_TEMPLATE.getWindMin()));
                
                //relwet
                Node relwetNode=windNode.getNextSibling();
                relwetNode=relwetNode.getNextSibling();
                NamedNodeMap relwetAttr=relwetNode.getAttributes();
                dayReport.setRelwetMax(getNodeAttribute(temperatureAttr,DAY_TEMPLATE.getRelwetMax()));
                dayReport.setRelwetMin(getNodeAttribute(temperatureAttr,DAY_TEMPLATE.getRelwetMin()));

                //heat
                Node heatNode=relwetNode.getNextSibling();
                heatNode=heatNode.getNextSibling();
                NamedNodeMap heatAttr=heatNode.getAttributes();
                dayReport.setHeatMax(getNodeAttribute(temperatureAttr,DAY_TEMPLATE.getHeatMax()));
                dayReport.setHeatMin(getNodeAttribute(temperatureAttr,DAY_TEMPLATE.getHeatMin()));

                townReport.addDayReport(dayReport);

            }  // for

        } catch (Exception e) {
            e.printStackTrace();
        }

        return townReport;

    }
}

3 result of parsing(of cource, they are similar for boths parsers):

Sun Jan 05 04:03:00 EET 2014 PHENOMENA:3 10 0 0 PRESSURE:751 753 TEMPERATURE:-1 -3 WIND:-1 -3 null RELWET:-1 -3 HEAT:-1 -3
Sun Jan 05 10:03:00 EET 2014 PHENOMENA:3 10 0 0 PRESSURE:752 754 TEMPERATURE:0 -2 WIND:0 -2 null RELWET:0 -2 HEAT:0 -2
Sun Jan 05 16:03:00 EET 2014 PHENOMENA:3 10 0 0 PRESSURE:754 756 TEMPERATURE:3 5 WIND:3 5 null RELWET:3 5 HEAT:3 5
Sun Jan 05 22:03:00 EET 2014 PHENOMENA:2 10 0 0 PRESSURE:755 757 TEMPERATURE:-1 1 WIND:-1 1 null RELWET:-1 1 HEAT:-1 1

Full code can be downloaded from here.