What is XML?

XML is a markup language that uses tags to define elements and structure data hierarchically. It looks like HTML but with custom tag names and stricter rules.

xml

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book category="fiction">
    <title>The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
    <price currency="USD">10.99</price>
  </book>
  <book category="non-fiction">
    <title>Clean Code</title>
    <author>Robert C. Martin</author>
    <price currency="USD">39.99</price>
  </book>
</bookstore>

XML was the king of data interchange in the early 2000s. Then JSON came along and took over most of the web. But XML still dominates in enterprise software, document formats, and configuration files.

XML vs HTML

	XML	HTML
Tags	You define them	Predefined (`<div>`, `<p>`, etc.)
Closing tags	Required	Sometimes optional (`<br>`)
Case sensitivity	Yes (`<Book>` != `<book>`)	No
Attribute quotes	Required	Sometimes optional
Self-closing	`<empty/>`	`<br>` or `<br/>`
Purpose	Data storage/transport	Document display

Basic Syntax

xml

<?xml version="1.0" encoding="UTF-8"?>  <!-- Declaration (optional) -->
<root>                                    <!-- Root element (required) -->
  <element attribute="value">             <!-- Element with attribute -->
    Text content                          <!-- Text node -->
  </element>
  <empty/>                                <!-- Self-closing element -->
  <!-- This is a comment -->
</root>

Rules

One root element - Everything must be inside a single root
Tags must close - <tag>content</tag> or <tag/>
Proper nesting - <a><b></b></a> not <a><b></a></b>
Attributes in quotes - id="123" not id=123
Case sensitive - <Tag> and </tag> don't match

Where You'll See This

Office documents - DOCX, XLSX, PPTX are ZIP files containing XML
SVG graphics - Scalable Vector Graphics are XML
RSS/Atom feeds - Blog and podcast syndication
SOAP APIs - Enterprise web services
Android layouts - UI definitions
Maven/Gradle - Java build configurations (pom.xml)
Spring configs - Java application configuration
XHTML - Strict HTML that follows XML rules

Namespaces

When combining XML from different sources, namespaces prevent tag collisions:

xml

<root xmlns:book="http://example.com/books"
      xmlns:order="http://example.com/orders">
  <book:title>The Great Gatsby</book:title>
  <order:title>Order #12345</order:title>
</root>

Same tag name (title), different meanings. The URLs are identifiers, not actual web pages.

Common Gotchas

⚠️Entity Escaping Required

Five characters must be escaped in text content:

< becomes <
> becomes >
& becomes &
" becomes "
' becomes '

Miss one & and your entire XML is invalid.

ℹ️CDATA Sections

For text with lots of special characters, wrap it in CDATA to avoid escaping: <![CDATA[Use <tags> & symbols freely!]]>

Verbosity - XML is about 2x larger than equivalent JSON. Every element needs opening AND closing tags.
Parsing complexity - DOM parsers load everything into memory. Use SAX/streaming for large files.
Attributes vs elements - No clear rule on when to use <price currency="USD">10</price> vs nested elements.
Whitespace handling - Is that newline significant or not? Depends on the schema.
Encoding issues - Always specify encoding. UTF-8 is safest.

XML vs JSON

	XML	JSON
Readability	Verbose but clear	Compact
Data types	All strings (without schema)	Strings, numbers, booleans, null
Attributes	Supported	No concept (use objects)
Comments	Supported	Not supported
Namespaces	Supported	Not supported
Schemas	XSD, DTD, RelaxNG	JSON Schema
Best for	Documents, configs	APIs, data interchange

XPath: Querying XML

XPath lets you select nodes from XML documents:

javascript

// Select all book titles
//book/title

// Select books over $20
//book[price > 20]

// Select the first author
//author[1]

In Code

javascript

// Browser (DOMParser)
const parser = new DOMParser();
const doc = parser.parseFromString(xmlString, "text/xml");
const titles = doc.querySelectorAll("title");

// Node.js (fast-xml-parser)
import { XMLParser, XMLBuilder } from 'fast-xml-parser';

const parser = new XMLParser({
  ignoreAttributes: false,
  attributeNamePrefix: "@_"
});
const data = parser.parse(xmlString);

// Build XML
const builder = new XMLBuilder();
const xml = builder.build({
  book: {
    "@_id": "123",
    title: "Clean Code"
  }
});

python

# Python (ElementTree)
import xml.etree.ElementTree as ET

# Parse
root = ET.fromstring(xml_string)
for book in root.findall('book'):
    title = book.find('title').text
    print(title)

# Build
root = ET.Element('books')
book = ET.SubElement(root, 'book')
ET.SubElement(book, 'title').text = 'Clean Code'
xml_string = ET.tostring(root, encoding='unicode')

Try It

Format & Validate XML

"XML is like violence: if it doesn't solve your problem, you're not using enough of it." - Unknown Enterprise Architect