Java : DocumentBuilderFactory (XML) with Examples

DocumentBuilderFactory (Java SE 17 & JDK 17) API Examples.
You will find code examples on most DocumentBuilderFactory methods.


Summary

Defines a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents.

Class diagram

final var xml = """
        <root>
            <child-a>AAA</child-a>
            <child-b>BBB</child-b>
        </root>
        """;

final var factory = DocumentBuilderFactory.newInstance();
final var builder = factory.newDocumentBuilder();

final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

final var childA = document.getElementsByTagName("child-a").item(0);
System.out.println(childA); // [child-a: null]
System.out.println(childA.getTextContent()); // AAA

final var childB = document.getElementsByTagName("child-b").item(0);
System.out.println(childB); // [child-b: null]
System.out.println(childB.getTextContent()); // BBB

Please see also the link below.

XML processing can expose applications to certain vulnerabilities. Among the most prominent and well-known attacks are the XML External Entity (XXE) injection attack and the exponential entity expansion attack, also know as the XML bomb or billion laughs attack.


Constructors

DocumentBuilderFactory ()

Protected constructor to prevent instantiation.

I think it's rare to create a subclass of DocumentBuilderFactory. Therefore, the code example is omitted.

Methods

abstract Object getAttribute (String name)

Allows the user to retrieve specific attributes on the underlying implementation.

final var dtdFile = Path.of("R:", "java-work", "sample.dtd");
System.out.println(dtdFile); // R:\java-work\sample.dtd

Files.writeString(dtdFile, """
        <!ENTITY aaa "bbb">
        """);

final var xml = """
        <!DOCTYPE root SYSTEM "file:///R:/java-work/sample.dtd">
        <root>&aaa;</root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    // Grant permission to all protocols.
    factory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "all");

    final var ret = factory.getAttribute(XMLConstants.ACCESS_EXTERNAL_DTD);
    System.out.println(ret); // "all"

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]
    System.out.println(root.getTextContent()); // bbb
}

{
    // Deny all access to external dtd.
    factory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");

    final var ret = factory.getAttribute(XMLConstants.ACCESS_EXTERNAL_DTD);
    System.out.println(ret); // ""

    final var builder = factory.newDocumentBuilder();

    try {
        final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));
    } catch (SAXException e) {
        System.out.println(e);
    }

    // Result
    // ↓
    //org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 57; External DTD:
    // Failed to read external DTD 'sample.dtd', because 'file' access is not allowed due to
    // restriction set by the accessExternalDTD property.
}

abstract boolean getFeature (String name)

Get the state of the named feature.

// An example of the exponential entity expansion attack.
final var xml = """
        <!DOCTYPE root[
            <!ENTITY x100 "X">
            <!ENTITY x99 "&x100;&x100;">
            <!ENTITY x98 "&x99;&x99;">
            ...
            (omitted)
            ...
            <!ENTITY x3 "&x4;&x4;">
            <!ENTITY x2 "&x3;&x3;">
            <!ENTITY x1 "&x2;&x2;">
        ]>
        <root>&x1;</root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.getFeature(XMLConstants.FEATURE_SECURE_PROCESSING);
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();

    try {
        final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));
    } catch (SAXException e) {
        System.out.println(e);
    }

    // Result
    // ↓
    //org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; JAXP00010001:
    // The parser has encountered more than "64000" entity expansions in this document;
    // this is the limit imposed by the JDK.
}

factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, false);

{
    final var ret = factory.getFeature(XMLConstants.FEATURE_SECURE_PROCESSING);
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();

    // Warning! Entity is growing exponentially, so it takes a lot of time to parse.
    //final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));
}

Schema getSchema ()

Gets the Schema object specified through the setSchema(Schema schema) method.

final var xsd = """
        <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
            <xsd:element name="root" type="xsd:string"/>
        </xsd:schema>
        """;

final var schemaFactory = SchemaFactory.newDefaultInstance();
final var schema = schemaFactory.newSchema(
        new StreamSource(new ByteArrayInputStream(xsd.getBytes())));

final var factory = DocumentBuilderFactory.newInstance();

System.out.println(factory.getSchema()); // null

factory.setSchema(schema);
System.out.println(factory.getSchema().equals(schema)); // true

final var errorHandler = new DefaultHandler() {
    @Override
    public void error(SAXParseException e) {
        System.out.println("-- ErrorHandler error --");
        System.out.println(e);
    }
};

{
    final var xml = """
            <root>abcd</root>
            """;

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]
    System.out.println(root.getTextContent()); // abcd
}

{
    final var xml = """
            <root><child>abcd</child></root>
            """;

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    // Result
    // ↓
    //-- ErrorHandler error --
    //org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 33; cvc-type.3.1.2:
    // Element 'root' is a simple type, so it must have no element information item [children].
}

boolean isCoalescing ()

Indicates whether or not the factory is configured to produce parsers which converts CDATA nodes to Text nodes and appends it to the adjacent (if any) Text node.

final var xml = """
        <root>aaa<![CDATA[<&>]]></root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isCoalescing();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa]
    //[#cdata-section: <&>]
}

factory.setCoalescing(true);

{
    final var ret = factory.isCoalescing();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa<&>]
}

boolean isExpandEntityReferences ()

Indicates whether or not the factory is configured to produce parsers which expand entity reference nodes.

final var xml = """
        <!DOCTYPE root [
            <!ENTITY aaa "bbb">
        ]>
        <root>&aaa;</root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isExpandEntityReferences();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var child = root.getFirstChild();
    System.out.println(child); // [#text: bbb]
}

factory.setExpandEntityReferences(false);

{
    final var ret = factory.isExpandEntityReferences();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    if (root.getFirstChild() instanceof EntityReference entityReference) {
        System.out.println(entityReference); // [aaa: null]
    }
}

boolean isIgnoringComments ()

Indicates whether or not the factory is configured to produce parsers which ignores comments.

final var xml = """
        <root>aaa<!--bbb--></root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isIgnoringComments();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa]
    //[#comment: bbb]
}

factory.setIgnoringComments(true);

{
    final var ret = factory.isIgnoringComments();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa]
}

boolean isIgnoringElementContentWhitespace ()

Indicates whether or not the factory is configured to produce parsers which ignore ignorable whitespace in element content.

final var xml = """
        <!DOCTYPE root [
            <!ELEMENT child-a (dummy?)>
            <!ELEMENT child-b (#PCDATA)>
        ]>
        <root>
            <child-a>   </child-a>
            <child-b>   </child-b>
        </root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isIgnoringElementContentWhitespace();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var childA = document.getElementsByTagName("child-a").item(0);
    System.out.println(childA); // [child-a: null]

    final var childB = document.getElementsByTagName("child-b").item(0);
    System.out.println(childB); // [child-b: null]

    System.out.println(childA.getFirstChild()); // [#text:    ]
    System.out.println(childB.getFirstChild()); // [#text:    ]
}

factory.setIgnoringElementContentWhitespace(true);

{
    final var ret = factory.isIgnoringElementContentWhitespace();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var childA = document.getElementsByTagName("child-a").item(0);
    System.out.println(childA); // [child-a: null]

    final var childB = document.getElementsByTagName("child-b").item(0);
    System.out.println(childB); // [child-b: null]

    System.out.println(childA.getFirstChild()); // null
    System.out.println(childB.getFirstChild()); // [#text:    ]
}

boolean isNamespaceAware ()

Indicates whether or not the factory is configured to produce parsers which are namespace aware.

final var xml = """
        <ns:root xmlns:ns="sample">
            <ns:child/>
        </ns:root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isNamespaceAware();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var child = document.getElementsByTagNameNS("sample", "child").item(0);
    System.out.println(child); // null
}

factory.setNamespaceAware(true);

{
    final var ret = factory.isNamespaceAware();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var child = document.getElementsByTagNameNS("sample", "child").item(0);
    System.out.println(child); // [ns:child: null]
}

boolean isValidating ()

Indicates whether or not the factory is configured to produce parsers which validate the XML content during parse.

final var xml = """
        <!DOCTYPE root [
            <!ELEMENT root (child-a)>
        ]>
        <root><child-z/></root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

final var errorHandler = new DefaultHandler() {
    @Override
    public void error(SAXParseException e) {
        System.out.println("-- ErrorHandler error --");
        System.out.println(e);
    }
};

{
    final var ret = factory.isValidating();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var childZ = document.getElementsByTagName("child-z").item(0);
    System.out.println(childZ); // [child-z: null]
}

factory.setValidating(true);

{
    final var ret = factory.isValidating();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    // Result
    // ↓
    //-- ErrorHandler error --
    //org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 17;
    // Element type "child-z" must be declared.
    //-- ErrorHandler error --
    //org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 24;
    // The content of element type "root" must match "(child-a)".
}

boolean isXIncludeAware ()

Get state of XInclude processing.

final var sampleFile = Path.of("R:", "java-work", "sample.xml");
System.out.println(sampleFile); // R:\java-work\sample.xml

Files.writeString(sampleFile, """
        <child>abcd</child>
        """);

final var xml = """
        <root xmlns:xi="http://www.w3.org/2001/XInclude">
            <xi:include href="file:///R:/java-work/sample.xml" parse="xml" />
        </root>
        """;

final var factory = DocumentBuilderFactory.newNSInstance();

{
    final var ret = factory.isXIncludeAware();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var nodes = document.getElementsByTagName("child");
    System.out.println(nodes.getLength()); // 0
}

factory.setXIncludeAware(true);

{
    final var ret = factory.isXIncludeAware();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var child = document.getElementsByTagName("child").item(0);
    System.out.println(child); // [child: null]
    System.out.println(child.getTextContent()); // abcd
}

static DocumentBuilderFactory newDefaultInstance ()

Creates a new instance of the DocumentBuilderFactory builtin system-default implementation.

final var xml = """
        <root>
            <child-a>AAA</child-a>
            <child-b>BBB</child-b>
        </root>
        """;

final var factory = DocumentBuilderFactory.newDefaultInstance();
final var builder = factory.newDocumentBuilder();

final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

final var childA = document.getElementsByTagName("child-a").item(0);
System.out.println(childA); // [child-a: null]
System.out.println(childA.getTextContent()); // AAA

final var childB = document.getElementsByTagName("child-b").item(0);
System.out.println(childB); // [child-b: null]
System.out.println(childB.getTextContent()); // BBB

static DocumentBuilderFactory newDefaultNSInstance ()

Creates a new NamespaceAware instance of the DocumentBuilderFactory builtin system-default implementation.

Please see also : newDefaultInstance

final var nsFactory = DocumentBuilderFactory.newDefaultNSInstance();
System.out.println(nsFactory.isNamespaceAware()); // true

final var factory = DocumentBuilderFactory.newDefaultInstance();
System.out.println(factory.isNamespaceAware()); // false

abstract DocumentBuilder newDocumentBuilder ()

Creates a new instance of a DocumentBuilder using the currently configured parameters.

Please see newInstance().

static DocumentBuilderFactory newInstance ()

Obtains a new instance of a DocumentBuilderFactory.

final var xml = """
        <root>
            <child-a>AAA</child-a>
            <child-b>BBB</child-b>
        </root>
        """;

final var factory = DocumentBuilderFactory.newInstance();
final var builder = factory.newDocumentBuilder();

final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

final var childA = document.getElementsByTagName("child-a").item(0);
System.out.println(childA); // [child-a: null]
System.out.println(childA.getTextContent()); // AAA

final var childB = document.getElementsByTagName("child-b").item(0);
System.out.println(childB); // [child-b: null]
System.out.println(childB.getTextContent()); // BBB

static DocumentBuilderFactory newInstance (String factoryClassName, ClassLoader classLoader)

Obtain a new instance of a DocumentBuilderFactory from class name.

This method is probably for third party libraries. Therefore, the code example is omitted.

static DocumentBuilderFactory newNSInstance ()

Creates a new NamespaceAware instance of a DocumentBuilderFactory.

Please see also : newInstance

final var nsFactory = DocumentBuilderFactory.newNSInstance();
System.out.println(nsFactory.isNamespaceAware()); // true

final var factory = DocumentBuilderFactory.newInstance();
System.out.println(factory.isNamespaceAware()); // false

static DocumentBuilderFactory newNSInstance (String factoryClassName, ClassLoader classLoader)

Creates a new NamespaceAware instance of a DocumentBuilderFactory from the class name.

This method is probably for third party libraries. Therefore, the code example is omitted.

abstract void setAttribute (String name, Object value)

Allows the user to set specific attributes on the underlying implementation.

Please see getAttribute(String name).

void setCoalescing (boolean coalescing)

Specifies that the parser produced by this code will convert CDATA nodes to Text nodes and append it to the adjacent (if any) text node.

Please see isCoalescing().

void setExpandEntityReferences (boolean expandEntityRef)

Specifies that the parser produced by this code will expand entity reference nodes.

Please see isExpandEntityReferences().

abstract void setFeature (String name, boolean value)

Set a feature for this DocumentBuilderFactory and DocumentBuilders created by this factory.

Please see getFeature(String name).

void setIgnoringComments (boolean ignoreComments)

Specifies that the parser produced by this code will ignore comments.

Please see isIgnoringComments().

void setIgnoringElementContentWhitespace (boolean whitespace)

Specifies that the parsers created by this factory must eliminate whitespace in element content (sometimes known loosely as 'ignorable whitespace') when parsing XML documents (see XML Rec 2.10).

Please see isIgnoringElementContentWhitespace().

void setNamespaceAware (boolean awareness)

Specifies that the parser produced by this code will provide support for XML namespaces.

Please see isNamespaceAware().

void setSchema (Schema schema)

Set the Schema to be used by parsers created from this factory.

Please see getSchema().

void setValidating (boolean validating)

Specifies that the parser produced by this code will validate documents as they are parsed.

Please see isValidating().

void setXIncludeAware (boolean state)

Set state of XInclude processing.

Please see isXIncludeAware().


Related posts

To top of page