What is XML?

Get to know XML, a strict yet flexible markup language used for everything from documentation to graphics.
67 readers like this.
Computer screen with files or windows open

Opensource.com

XML is a hierarchical markup language. It uses opening and closing tags to define data. It's used to store and exchange data, and because of its extreme flexibility, it's used for everything from documentation to graphics.

Here's a sample XML document:

<xml>
  <os>
   <linux>
    <distribution>
      <name>Fedora</name>
      <release>8</release>
      <codename>Werewolf</codename>
    </distribution>

    <distribution>
      <name>Slackware</name>
      <release>12.1</release>
      <mascot>
	<official>Tux</official>
        <unofficial>Bob Dobbs</unofficial>
      </mascot>
    </distribution>
   </linux>
  </os>    
</xml>

Reading the sample XML, you might find there's an intuitive quality to the format. You can probably understand the data in this document whether you're familiar with the subject matter or not. This is partly because XML is considered verbose. It uses lots of tags, the tags can have long and descriptive names, and the data is ordered in a hierarchical manner that helps explain the data's relationships. You probably understand from this sample that the Fedora distribution and the Slackware distribution are two different and unrelated instances of Linux because each one is "contained" inside its own independent <distribution> tag.

XML is also extremely flexible. Unlike HTML, there's no predefined list of tags. You are free to create whatever data structure you need to represent.

Components of XML

Data exists to be read, and when a computer "reads" data, the process is called parsing. Using the sample XML data again, here are the terms that most XML parsers consider significant.

  • Document: The <xml> tag opens a document, and the </xml> tag closes it.
  • Node: The <os>, <distribution>, and <mascot> are nodes. In parsing terminology, a node is a tag that contains other tags.
  • Element: An entity such as <name>Fedora</name> and <official>Tux</official>, from the first < to the last > is an element.
  • Content: The data between two element tags is considered content. In the first <name> element, the string Fedora is the content.

XML schema

Tags and tag inheritance in an XML document are known as schema.

Some schemas are made up as you go (for example, the sample XML code in this article was purely improvised), while others are strictly defined by a standards group. For example, the Scalable Vector Graphics (SVG) schema is defined by the W3C, while the DocBook schema is defined by Norman Walsh.

A schema enforces consistency. The most basic schemas are usually also the most restrictive. In my example XML code, it wouldn't make sense to place a distribution name within the <mascot> node because the implied schema of the document makes it clear that a mascot must be a "child" element of a distribution.

Data object model (DOM)

Talking about XML would get confusing if you had to constantly describe tags and positions (e.g., "the name tag of the second distribution tag in the Linux part of the OS section"), so parsers use the concept of a Document Object Model (DOM) to represent XML data. The DOM places XML data into a sort of "family tree" structure, starting from the root element (in my sample XML, that's the os tag) and including each tag.

This same XML data structure can be expressed as paths, just like files in a Linux system or the location of web pages on the internet. For instance, the path to the <mascot> tag can be represented as //os/linux/distribution/slackware/mascot.

The path to both <distribution> tags can be represented as //os/linux/distribution. Because there are two distribution nodes, a parser loads both nodes (and the contents of each) into an array that can be queried.

Strict XML

XML is also known for being strict. This means that most applications are designed to intentionally fail when they encounter errors in XML. That may sound problematic, but it's one of the things developers appreciate most about XML because unpredictable things can happen when applications try to guess how to resolve an error. For example, back before HTML was well defined, most web browsers included a "quirks mode" so that when people tried to view poor HTML code, the web browser could load what the author probably intended. The results were wildly unpredictable, especially when one browser guessed differently than another.

XML disallows this by intentionally failing when there's an error. This lets the author fix errors until they produce valid XML. Because XML is well-defined, there are validator plugins for many applications and standalone commands like xmllint and xmlstarlet to help you locate errors early.

Transforming XML

Because XML is often used as an interchange format, it's common to transform XML into some other data format or into some other XML schema. Classic examples include XSLTProc, xmlto, and pandoc, but technically there are many other applications designed, at least in part, to convert XML.

In fact, LibreOffice uses XML to layout its word processor and spreadsheet documents, so any time you export or convert a file from LibreOffice, you're transforming XML.

Ebooks in the open source EPUB format use XML, so any time you convert a document into an EPUB or from an EPUB, you're transforming XML.

Inkscape, the vector-based illustration application, saves its files in SVG, which is an XML schema designed for graphics. Any time you export an image from Inkscape as a PNG file, you're transforming XML.

The list could go on and on. XML is a data storage format, and it's designed to ensure that your data, whether it's points and lines on a canvas, nodes on a chart, or just words in a document, can be easily and accurately extracted, updated, and converted. 

Learning XML

Writing XML is a lot like writing HTML. Thanks to the hard work of Jay Nick, free and fun XML lessons are available online that teach you how to create graphics with XML.

In general, very few special tools are required to explore XML. Thanks to the close relationship between HTML and XML, you can view XML using a web browser. In addition, open source text editors like QXmlEdit, NetBeans, and Kate make typing and reading XML easy with helpful prompts, autocompletion, syntax verification, and more.

Choose XML

XML may look like a lot of data at first, but it's not that much different than HTML (in fact, HTML has been reimplemented as XML in the form of XHTML). XML has the unique benefit that the components forming its structure also happen to be metadata providing information about what it's storing. A well-designed XML schema contains and describes your data, allowing a user to understand it at a glance and parse it quickly, and enabling developers to parse it efficiently with convenient programming libraries.

What to read next
Seth Kenlon
Seth Kenlon is a UNIX geek, free culture advocate, independent multimedia artist, and D&D nerd. He has worked in the film and computing industry, often at the same time.

4 Comments

One might argue that HTML is a subset of XML. Trying to read an XML file with a browser and have a pleasing result requires creation of a XSL file, to tell the browser what to do with various XML tags.
Having said this, I've noticed that Firefox now will not use an XSL file if you direct it to an XML file in your home directory on your computer. What you need to do is to set up a localhost by running httpd, then put these files into /var/www/html/, and then direct your browser to localhost/

Officially, xhtml is HTML implemented in XML.

There's definitely an argument for HTML5, but I don't think it would necessarily be a purely technical one.

Firefox does still parse XSL style sheets for XML. I use this for some personal projects. It works even on mobile, but you do have to navigate precisely to the page (in other words, setting the XML target as the htaccess default index page fails to render on mobile. It's on my todo list to file a bug about that..)

Here's a quick proof-of-concept. It's working on both mobile and the latest Firefox for me:

http://linuxinfoshop.tk/xslonline/index.xml

In reply to by Greg P

I didn't say that it doesn't work on Firefox. What I said was that you can't direct Firefox to a directory on your computer where there is an XML file and its XSL file, and expect it to work. Thus, you need to put it on a website or create a localhost in /var/www/.

In reply to by sethkenlon

Understood. Good to know, thanks!

That is odd...but I'm not a Firefox maintainer, so maybe parsing XML and XSL isn't worth maintaining...?

In reply to by Greg P

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.