pod xml

XML Parser and Document Modeling



XAttr models an XML attribute in an element.


XML document encapsulates the root element and document type.


XML document type declaration (but not the whole DTD).


Models an XML element: its name, attributes, and children nodes.


XNode is the base class for XElem and XText.


Models a XML Namespace uri.


XParser is a simple, lightweight XML parser.


XML processing instruction node.


XText represents the character data inside an element.



Enumerates the type of XNode and current node of XParser.



XML exception.


Incomplete document exception indicates that the end of stream was reached before the end of the document was parsed.


The xml API provides the core APIs for working with XML:

  1. XElem: Provides a standard representation of an XML element tree to be used in memory. It is similar to the W3's DOM, but Fantom centric.
  2. XParser: is a non-validating XML parser. It may be used in two modes: to read an entire XML document into memory or as a pull-parser.

The features supported by XParser:

  • All element, attribute, processing instructions, and character data productions are supported
  • CDATA sections are supported
  • Namespaces are supported at both the element and attribute level
  • Doctype declarations provide access to the public and system identifiers
  • DTDs are ignored (as such you can't use internal or external entity declarations)
  • No access to comments is provided by the XML parser
  • Character data consisting only of whitespace is always ignored


XML documents are modeled in memory using the following classes:

  • XDoc: models the entire document providing access to the root element, doctype, and processing instructions declared before the root element
  • XElem: models an element name, attributes, and children nodes
  • XText: models character data in an element
  • XPi: models a processing instruction
  • XAttr: models an attribute name/value pair
  • XNs: models the prefix and URI of a XML namespace

XML documents are structured as a tree of nodes using the classes listed above. Typically the tree is built by parsing a XML document from an input stream. But you can construct a document in memory using the APIs and it-blocks:

doc := XDoc
    XElem("a") { addAttr("flags", "0"); XText("blah"), },
    XElem("b") { XElem("c"), },
    XElem("m") { XText("I "), XElem("i") { XText("mean"), }, XText(" it!"), },

// would print the following to the console
<?xml version='1.0' encoding='UTF-8'?>
 <a flags='0'>blah</a>
 <m>I <i>mean</i> it!</m>


The XNs class is used model XML namespaces. The default namespace is indicated with the prefix of "". Both XElem and XAttr define a set of methods for working with qualified names:

ns := XNs("x", `urn:foo`)
e := XElem("name", ns)

e.ns      =>  ns
e.prefix  =>  "x"
e.uri     =>  `urn:foo`
e.name    =>  "name"
e.qname   =>  "x:name"

If you construct a document by hand, then you are responsible for creating each element with the correct XNs and ensuring that the appropriate namespace attributes are declared:

nsDef := XNs("", `http://foo/default`)
nsBar := XNs("b", `http://foo/bar`)
root := XElem("root", nsBar)
  XElem("elem", nsDef),
  XElem("elem", nsBar),

// would print the following to the console
<b:root xmlns='http://foo/default' xmlns:b='http://foo/bar'>


All the XML node classes support a write method which takes an OutStream. During debugging it is often convenient to write to standard out:


To write an XML document from memory to a file:

out := `test.xml`.toFile.out

If you are generating an XML document on the fly you might want to use OutStream directly. It supports escaping XML control characters via the writeXml method:

// escape markup in text

// escape markup and quotes
out.writeChars("attr='").writeXml(attrVal, OutStream.xmlEscQuotes).writeChars("'")

You can use Str.toXml to generate a string with XML markup escaped. However, you should prefer OutStream when streaming which is more efficient.


The XParser class is used to parse XML input streams into XElems. The easiest way to do this is to parse the entire document into memory using the parseDoc method:

// parse and close input stream
doc := XParser(in).parseDoc

// parse a file
doc := XParser(`test.xml`.toFile.in).parseDoc

// parse a string
doc := XParser("<foo/>".in).parseDoc

The code above parses the document entirely into memory. This tends to be easiest way to work with XML documents. However it can create efficiency problems when parsing large documents, especially when mapping the XElems into other data structures. To support more efficient parsing of XML streams, XParser may also be used to read elements off the input stream one at a time. This is similar to the SAX API, except you pull events instead of having them pushed to you.

To perform pull parsing, use the next method to iterate through the document. This tokenizes the stream into XElem, XText, and XPi chunks. Each call to next advances to the next token and returns its node type. You may also check the type of the current token using nodeType. You may access the current token using elem, text, or pi.

XParser maintains a stack of XElems for you from the root element down to the current element. You may check the depth of the stack using the depth method. Get the current element at any position in the stack via elemAt.

It is very important to understand the XElem at given depth is only valid until the parser returns elemEnd for that depth. After that the element will be reused. A XText instance is only valid until the next call to next. You can make a safe copy of nodes using copy. You can also use parseElem to read the current element and its is descendants into memory during pull-parsing.

Example of pull parsing:

parser := XParser(

while (parser.next != null)
  echo("$parser.nodeType $parser.depth $parser.elem")

// prints the following to the console
elemStart 0 <root>
elemStart 1 <a>
text      1 <a>
elemEnd   1 <a>
elemEnd   0 <root>