What is XML?
XML is a markup language that uses tags to define elements and structure data hierarchically. It looks like HTML but with custom tag names and stricter rules.
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="fiction">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<price currency="USD">10.99</price>
</book>
<book category="non-fiction">
<title>Clean Code</title>
<author>Robert C. Martin</author>
<price currency="USD">39.99</price>
</book>
</bookstore>
XML was the king of data interchange in the early 2000s. Then JSON came along and took over most of the web. But XML still dominates in enterprise software, document formats, and configuration files.
XML vs HTML
| XML | HTML | |
|---|---|---|
| Tags | You define them | Predefined (<div>, <p>, etc.) |
| Closing tags | Required | Sometimes optional (<br>) |
| Case sensitivity | Yes (<Book> != <book>) | No |
| Attribute quotes | Required | Sometimes optional |
| Self-closing | <empty/> | <br> or <br/> |
| Purpose | Data storage/transport | Document display |
Basic Syntax
<?xml version="1.0" encoding="UTF-8"?> <!-- Declaration (optional) -->
<root> <!-- Root element (required) -->
<element attribute="value"> <!-- Element with attribute -->
Text content <!-- Text node -->
</element>
<empty/> <!-- Self-closing element -->
<!-- This is a comment -->
</root>
Rules
- One root element - Everything must be inside a single root
- Tags must close -
<tag>content</tag>or<tag/> - Proper nesting -
<a><b></b></a>not<a><b></a></b> - Attributes in quotes -
id="123"notid=123 - Case sensitive -
<Tag>and</tag>don't match
Where You'll See This
- Office documents - DOCX, XLSX, PPTX are ZIP files containing XML
- SVG graphics - Scalable Vector Graphics are XML
- RSS/Atom feeds - Blog and podcast syndication
- SOAP APIs - Enterprise web services
- Android layouts - UI definitions
- Maven/Gradle - Java build configurations (
pom.xml) - Spring configs - Java application configuration
- XHTML - Strict HTML that follows XML rules
Namespaces
When combining XML from different sources, namespaces prevent tag collisions:
<root xmlns:book="http://example.com/books"
xmlns:order="http://example.com/orders">
<book:title>The Great Gatsby</book:title>
<order:title>Order #12345</order:title>
</root>
Same tag name (title), different meanings. The URLs are identifiers, not actual web pages.
Common Gotchas
Five characters must be escaped in text content:
<becomes<>becomes>&becomes&"becomes"'becomes'
Miss one & and your entire XML is invalid.
For text with lots of special characters, wrap it in CDATA to avoid escaping:
<![CDATA[Use <tags> & symbols freely!]]>
- Verbosity - XML is about 2x larger than equivalent JSON. Every element needs opening AND closing tags.
- Parsing complexity - DOM parsers load everything into memory. Use SAX/streaming for large files.
- Attributes vs elements - No clear rule on when to use
<price currency="USD">10</price>vs nested elements. - Whitespace handling - Is that newline significant or not? Depends on the schema.
- Encoding issues - Always specify encoding. UTF-8 is safest.
XML vs JSON
| XML | JSON | |
|---|---|---|
| Readability | Verbose but clear | Compact |
| Data types | All strings (without schema) | Strings, numbers, booleans, null |
| Attributes | Supported | No concept (use objects) |
| Comments | Supported | Not supported |
| Namespaces | Supported | Not supported |
| Schemas | XSD, DTD, RelaxNG | JSON Schema |
| Best for | Documents, configs | APIs, data interchange |
XPath: Querying XML
XPath lets you select nodes from XML documents:
// Select all book titles
//book/title
// Select books over $20
//book[price > 20]
// Select the first author
//author[1]
In Code
// Browser (DOMParser)
const parser = new DOMParser();
const doc = parser.parseFromString(xmlString, "text/xml");
const titles = doc.querySelectorAll("title");
// Node.js (fast-xml-parser)
import { XMLParser, XMLBuilder } from 'fast-xml-parser';
const parser = new XMLParser({
ignoreAttributes: false,
attributeNamePrefix: "@_"
});
const data = parser.parse(xmlString);
// Build XML
const builder = new XMLBuilder();
const xml = builder.build({
book: {
"@_id": "123",
title: "Clean Code"
}
});
# Python (ElementTree)
import xml.etree.ElementTree as ET
# Parse
root = ET.fromstring(xml_string)
for book in root.findall('book'):
title = book.find('title').text
print(title)
# Build
root = ET.Element('books')
book = ET.SubElement(root, 'book')
ET.SubElement(book, 'title').text = 'Clean Code'
xml_string = ET.tostring(root, encoding='unicode')
Try It
Format & Validate XML"XML is like violence: if it doesn't solve your problem, you're not using enough of it." - Unknown Enterprise Architect