Extracting Pertinent Information from an XML Tree
Problem
You want to access individual pieces of data from an XML document.
Solution
Use the same DOM methods you use to query your web page elements to query the
XML document. For example, the following will get all elements that have a resource
tag name:
var resources = xmlHttpObj.responseXML.getElementsByTagName("resource");
EXPLAIN
When you have a reference to an XML document, you can use the DOM methods to
query any of the data in the document. It’s not as simple as accessing data from a JSON
object, but it’s vastly superior to extracting data from a large piece of just plain text.
To demonstrate working with an XML document, Contains a Node.js
(commonly referred to simply as Node) application that returns XML containing three
resources. Each resource contains a title and a url.
It’s not a complicated application or a complex XML result, but it’s sufficient to generate
an XML document. Notice that a MIME type of text/xml is given in the header, and
the Access-Control-
Allow-Origin header value is set to accept queries from all do‐
mains (*). Because the Node application is running at a different port than the web page
querying it, we have to set this value in order to allow cross-domain requests.
Node.js server application that returns an XML result
var http = require('http'), url = require('url'); var XMLWriter = require('xml-writer'); // start server, listen for requests var server = http.createServer().listen(8080); server.on('request', function(req, res) { var xw = new XMLWriter; // start doc and root element xw.startDocument().startElement("resources"); // resource xw.startElement("resource"); xw.writeElement("title","Ecma-262 Edition 6"); xw.writeElement("url", "http://wiki.ecmascript.org/doku.php?id=harmony:specification_drafts"); xw.endElement(); // resource xw.startElement("resource"); xw.writeElement("title","ECMA-262 Edition 5.1"); xw.writeElement("url", "http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf"); xw.endElement(); // resource xw.startElement("resource"); xw.writeElement("title", "ECMA-402"); xw.writeElement("url", "http://ecma-international.org/ecma-402/1.0/ECMA-402.pdf"); xw.endElement(); // end resources xw.endElement(); res.writeHeader(200, {"Content-Type": "application/xml", "Access-Control-Allow-Origin": "*"}); res.end(xw.toString(),"utf8"); });
Most Ajax calls process plain text or JSON, but there’s still a need for processing XML. SVG is still XML, as is MathML, XHTML, and other markup languages.
In the solution, a new XMLHttpRequest object is created to handle the client-server communication. If you’ve not used Ajax previously, the XMLHttpRequest object’s methods are:
• open: Initializes a request. Parameters include the method (GET, POST, DELETE, or PUT), the request URL, whether the request is asynchronous, and a possible username and password. By default, all requests are sent asynchronously.
• setRequestHeader: Sets the MIME type of the request.
• send: Sends the request.
• sendAsBinary: Sends binary data.
• abort: Aborts an already sent request.
• getResponseHeader: Retrieves the header text, or null if the response hasn’t been returned yet or there is no header.
• getAllResponseHeaders: Retrieves the header text for a multipart request
The communication is opened using object’s open() method, passing in the HTTP method (GET), the request URL (the Node application), as well as a value of true, signaling that the communication is asynchronous (the application doesn’t block wait‐ ing on the return request). If the application is password protected, the fourth and fifth optional parameters are the username and password, respectively.
I know that the application I’m calling is returning an XML-formatted response, so it’s not necessary to override the MIME type In the application, the XMLHttpRequest’s onReadyStateChange event handler is assigned a callback function, getData(), and then the request is sent with send(). If the HTTP method had been POST, the prepared data would have been sent as a parameter of send().
In the callback function getData(), the XMLHttpRequest object’s readyState and status properties are checked . Only when the readyState is 4 and status is 200 is the result processed.
The readyState indicates what state the Ajax call is in, and the value of 200 is the HTTP OK response code. Because we know the result is XML, the application accesses the XML document via the XMLHttpRequest object’s responseXML property.
For other data types, the data is accessed via the response prop‐ erty, and responseType provides the data type (arrayBuffer, blob, document, json, text). Not all browsers support all data types, but all modern browsers do support XML and at least arrayBuffer, JSON, and text.
Application to process resources from returned XML
<!DOCTYPE html> <html> <head> <title>Stories</title> <meta charset="utf-8" /> </head> <body> <div id="result"> </div> <script type="text/javascript"> var xmlHttpObj; // ajax object if (window.XMLHttpRequest) { xmlRequest = new XMLHttpRequest(); } // build request var url = "http://shelleystoybox.com:8080"; xmlRequest.open('GET', url, true); xmlRequest.onreadystatechange = getData; xmlRequest.send(); function getData() { if (xmlRequest.readyState == 4 && xmlRequest.status == 200) { try { var result = document.getElementById("result"); var str = "<p>"; // can use DOM methods on XML document var resources = xmlRequest.responseXML.getElementsByTagName("resource"); // process resources for (var i = 0; i < resources.length; i++) { var resource = resources[i]; // get title and url, generate HTML var title = resource.childNodes[0].firstChild.nodeValue; var url = resource.childNodes[1].firstChild.nodeValue; str += "<a href='" + url + "'>" + title + "</a><br />"; } // finish HTML and insert str+="</p>"; result.innerHTML=str; } catch (e) { console.log(e.message); } } } </script> </body> </html>
When processing the XML code, the application first queries for all resource elements, returned in a nodeList. The application cycles through the collection, accessing each resource element in order to access the title and url, both of which are child nodes. Each is accessed via the childNodes collection, and their data, contained in the node Value attribute, is extracted.
The resource data is used to build a string of linked resources, which is output to the page using innerHTML. Instead of using a succession of childNodes element collections to walk the trees, I could have used the Selectors API to access all URLs and titles, and then traversed both collections at one time, pulling the paired values from each, in sequence:
var urls = xmlRequest.responseXML.querySelectorAll("resource url"); var titles = xmlRequest.responseXML.querySelectorAll("resource title"); for (var i = 0; i < urls.length; i++) { var url = urls[i].firstChild.nodeValue; var title = titles[i].firstChild.nodeValue; str += "" + title + " "; }
I could have also used getElementsByTagName against each returned resource element —any XML DOM method that works with the web page works with the returned XML. The try…catch error handling should catch any query that fails because the XML is incomplete.
No comments:
Post a Comment