XML (eXtensible Markup Language)

HTML's stricter cousin, used for config files and enterprise software.

4 min read

What is XML?

XML is a markup language that uses tags to define elements and structure data hierarchically. It looks like HTML but with custom tag names and stricter rules.

xml
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book category="fiction">
    <title>The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
    <price currency="USD">10.99</price>
  </book>
  <book category="non-fiction">
    <title>Clean Code</title>
    <author>Robert C. Martin</author>
    <price currency="USD">39.99</price>
  </book>
</bookstore>

XML was the king of data interchange in the early 2000s. Then JSON came along and took over most of the web. But XML still dominates in enterprise software, document formats, and configuration files.

XML vs HTML

XMLHTML
TagsYou define themPredefined (<div>, <p>, etc.)
Closing tagsRequiredSometimes optional (<br>)
Case sensitivityYes (<Book> != <book>)No
Attribute quotesRequiredSometimes optional
Self-closing<empty/><br> or <br/>
PurposeData storage/transportDocument display

Basic Syntax

xml
<?xml version="1.0" encoding="UTF-8"?>  <!-- Declaration (optional) -->
<root>                                    <!-- Root element (required) -->
  <element attribute="value">             <!-- Element with attribute -->
    Text content                          <!-- Text node -->
  </element>
  <empty/>                                <!-- Self-closing element -->
  <!-- This is a comment -->
</root>

Rules

  1. One root element - Everything must be inside a single root
  2. Tags must close - <tag>content</tag> or <tag/>
  3. Proper nesting - <a><b></b></a> not <a><b></a></b>
  4. Attributes in quotes - id="123" not id=123
  5. Case sensitive - <Tag> and </tag> don't match

Where You'll See This

  • Office documents - DOCX, XLSX, PPTX are ZIP files containing XML
  • SVG graphics - Scalable Vector Graphics are XML
  • RSS/Atom feeds - Blog and podcast syndication
  • SOAP APIs - Enterprise web services
  • Android layouts - UI definitions
  • Maven/Gradle - Java build configurations (pom.xml)
  • Spring configs - Java application configuration
  • XHTML - Strict HTML that follows XML rules

Namespaces

When combining XML from different sources, namespaces prevent tag collisions:

xml
<root xmlns:book="http://example.com/books"
      xmlns:order="http://example.com/orders">
  <book:title>The Great Gatsby</book:title>
  <order:title>Order #12345</order:title>
</root>

Same tag name (title), different meanings. The URLs are identifiers, not actual web pages.

Common Gotchas

⚠️Entity Escaping Required

Five characters must be escaped in text content:

  • < becomes &lt;
  • > becomes &gt;
  • & becomes &amp;
  • " becomes &quot;
  • ' becomes &apos;

Miss one & and your entire XML is invalid.

ℹ️CDATA Sections

For text with lots of special characters, wrap it in CDATA to avoid escaping: <![CDATA[Use <tags> & symbols freely!]]>

  • Verbosity - XML is about 2x larger than equivalent JSON. Every element needs opening AND closing tags.
  • Parsing complexity - DOM parsers load everything into memory. Use SAX/streaming for large files.
  • Attributes vs elements - No clear rule on when to use <price currency="USD">10</price> vs nested elements.
  • Whitespace handling - Is that newline significant or not? Depends on the schema.
  • Encoding issues - Always specify encoding. UTF-8 is safest.

XML vs JSON

XMLJSON
ReadabilityVerbose but clearCompact
Data typesAll strings (without schema)Strings, numbers, booleans, null
AttributesSupportedNo concept (use objects)
CommentsSupportedNot supported
NamespacesSupportedNot supported
SchemasXSD, DTD, RelaxNGJSON Schema
Best forDocuments, configsAPIs, data interchange

XPath: Querying XML

XPath lets you select nodes from XML documents:

javascript
// Select all book titles
//book/title

// Select books over $20
//book[price > 20]

// Select the first author
//author[1]

In Code

javascript
// Browser (DOMParser)
const parser = new DOMParser();
const doc = parser.parseFromString(xmlString, "text/xml");
const titles = doc.querySelectorAll("title");

// Node.js (fast-xml-parser)
import { XMLParser, XMLBuilder } from 'fast-xml-parser';

const parser = new XMLParser({
  ignoreAttributes: false,
  attributeNamePrefix: "@_"
});
const data = parser.parse(xmlString);

// Build XML
const builder = new XMLBuilder();
const xml = builder.build({
  book: {
    "@_id": "123",
    title: "Clean Code"
  }
});
python
# Python (ElementTree)
import xml.etree.ElementTree as ET

# Parse
root = ET.fromstring(xml_string)
for book in root.findall('book'):
    title = book.find('title').text
    print(title)

# Build
root = ET.Element('books')
book = ET.SubElement(root, 'book')
ET.SubElement(book, 'title').text = 'Clean Code'
xml_string = ET.tostring(root, encoding='unicode')

Try It

Format & Validate XML

"XML is like violence: if it doesn't solve your problem, you're not using enough of it." - Unknown Enterprise Architect