PHP XML
Introduction
XML is a popular data-exchange, configuration, and message-passing format. Although JSON has displaced XML for many basic situations, XML still plays an important role in a developer’s life. With the help of a few extensions, PHP lets you read and write XML for every occasion.
XML provides developers with a structured way to mark up data with tags arranged in a tree-like hierarchy. One perspective on XML is to treat it as CSV on steroids. You can use XML to store records broken into a series of fields. But instead of merely separating each field with a comma, you can include a field name, a type, and attributes alongside the data.
Another view of XML is as a document representation language. For instance, this book was written using XML. The book is divided into chapters; each chapter into recipes; and each recipe into Problem, Solution, and Discussion sections. Within any individual section, we further subdivide the text into paragraphs, tables, figures, and examples. An article on a web page can similarly be divided into the page title and headline, the authors of the piece, the story itself, and any sidebars, related links, and additional content.
XML content looks similar to HTML. Both use tags bracketed by < and > for marking up text. But XML is both stricter and looser than HTML. It’s stricter because all container tags must be properly closed. No opening elements are allowed without a corresponding closing tag. It’s looser because you’re not forced to use a set list of tags, such as <a>, <img>, and <h1>. Instead, you have the freedom to choose a set of tag names that best describe your data.
Other key differences between XML and HTML are case sensitivity, attribute quoting, and whitespace. In HTML, <B> and <b> are the same bold tag; in XML, they’re two different tags. In HTML, you can often omit quotation marks around attributes; XML, however, requires them. So you must always write:
<element attribute="value">
Additionally, HTML parsers generally ignore whitespace, so a run of 20 consecutive spaces is treated the same as one space. XML parsers preserve whitespace, unless explicitly instructed otherwise. Because all elements must be closed, empty elements must end with />. For instance, in HTML, the line break is <br>, whereas in XHTML, which is HTML that validates as XML, it’s written as <br />.1
There is another restriction on XML documents. When XML documents are parsed into a tree of elements, the outermost element is known as the root element. Just as a tree has only one trunk, an XML document must have exactly one root element. In the previous book example, this means chapters must be bundled inside a book tag. If you want to place multiple books inside a document, you need to package them inside a bookcase or another container. This limitation applies only to the document root. Again, just like trees can have multiple branches off of the trunk, it’s legal to store multiple books inside a bookcase.
This chapter doesn’t aim to teach you XML; for an introduction to XML, see Learning XML by Erik T. Ray (O’Reilly). A solid nuts-and-bolts guide to all aspects of XML is XML in a Nutshell by Elliotte Rusty Harold and W. Scott Means (O’Reilly).
Now that we’ve covered the rules, here’s an example. If you are a librarian and want to convert your card catalog to XML, start with this basic set of XML tags:
<book>
<title>PHP Cookbook</title>
<author>Sklar, David and Trachtenberg, Adam</author>
<subject>PHP</subject>
</book>
From there, you can add new elements or modify existing ones. For example, <author> can be divided into first and last name, or you can allow for multiple records so two authors aren’t placed in one field.
PHP has a set of XML extensions that:
• Work together as a unified whole
• Are standardized on a single XML library: libxml2
• Fully comply with W3C specifications
• Efficiently process data
• Provide you with the right XML tool for your job
Additionally, following the PHP tenet that creating web applications should be easy, there’s an XML extension that makes it simple to read and alter XML documents. The aptly named SimpleXML extension allows you to interact with the information in an XML document as though these pieces of information are arrays and objects, iterating through them with foreach loops and editing them in place merely by assigning new values to variables.
The first two recipes in this chapter cover parsing XML. Shows how to write XML without additional tools. To use DOM extension to write XML in a standardized fashion.
The complement to writing XML is parsing XML. That’s the subject of the next three recipes. They’re divided based upon the complexity and size of the XML document you’re trying to parse. Covers how to parse basic XML documents. If you need more sophisticated XML parsing tools, move onto. When your XML documents are extremely large and memory intensive, turn to. If this is your first time using XML, and you’re unsure which recipe is right for you, try them in order, because the code becomes increasingly complex as your requirements go up.
XPath is the topic. It’s a W3C standard for extracting specific information from XML documents. We like to think of it as regular expressions for XML. XPath is one of the most useful, yet unused parts of the XML family of specifications. If you process XML on a regular basis, you should be familiar with XPath.
With XSLT, you can take an XSL stylesheet and turn XML into viewable output. By separating content from presentation, you can make one stylesheet for web browsers, another for mobile phones, and a third for print, all without changing the content itself.
After introducing XSLT, the two recipes that follow show how to pass information back and forth between PHP and XSLT. Tells how to send data from PHP to an XSLT stylesheet; Shows how to call out to PHP from within an XSLT style‐sheet.
As long as your XML document abides by the structural rules of XML, it is known as well-formed. However, unlike HTML, which has a specific set of elements and attributes that must appear in set places, XML has no such restrictions.
Yet, in some cases, it’s useful to make sure your XML documents abide by a specification. This allows tools, such as web browsers, RSS readers, or your own scripts, to easily process the input. When an XML document follows all the rules set out by a specification, it is known as valid. Covers how to validate an XML document.
One of PHP’s major limitations is its handling of character sets and document encodings. PHP strings are not associated with a particular encoding, but all the XML extensions require UTF-8 input and emit UTF-8 output. Therefore, if you use a character set incompatible with UTF-8, you must manually convert your data both before sending it into an XML extension and after you receive it back. Explores the best ways to handle this process.
The chapter concludes with a number of recipes dedicated to reading and writing a number of common types of XML documents, specifically RSS and Atom. These are the two most popular data syndication formats, and are useful for exchanging many types of data, including blog posts, podcasts, and even mapping information.
PHP Cookbook also covers RESTful APIs. This topic is so important, it gets two dedicated chapters of its own. Describes how to consume RESTful APIs, and tells how to implement RESTful APIs of your very own.
No comments:
Post a Comment