Java : DocumentBuilderFactory (XML) 示例

DocumentBuilderFactory (Java SE 22 & JDK 22) 示例。
您将在大多数 DocumentBuilderFactory 方法中找到代码示例。

注解 :

  • 本文可能使用了翻译软件以方便阅读。 另请查看英文原文

简介

定义一个工厂 API,使应用程序能够获取从 XML 文档生成 DOM 对象树的解析器。 (机器翻译)

Class diagram

final var xml = """
        <root>
            <child-a>AAA</child-a>
            <child-b>BBB</child-b>
        </root>
        """;

final var factory = DocumentBuilderFactory.newInstance();
final var builder = factory.newDocumentBuilder();

final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

final var childA = document.getElementsByTagName("child-a").item(0);
System.out.println(childA); // [child-a: null]
System.out.println(childA.getTextContent()); // AAA

final var childB = document.getElementsByTagName("child-b").item(0);
System.out.println(childB); // [child-b: null]
System.out.println(childB.getTextContent()); // BBB

Please see also the link below.

XML processing can expose applications to certain vulnerabilities. Among the most prominent and well-known attacks are the XML External Entity (XXE) injection attack and the exponential entity expansion attack, also know as the XML bomb or billion laughs attack.


Constructors

DocumentBuilderFactory ()

受保护的构造函数以防止实例化。 (机器翻译)

protected. I think it's rare to create a subclass of this class. Therefore, the code example is omitted.

Methods

abstract Object getAttribute (String name)

允许用户检索底层实现的特定属性。 (机器翻译)

final var dtdFile = Path.of("R:", "java-work", "sample.dtd");
System.out.println(dtdFile); // R:\java-work\sample.dtd

Files.writeString(dtdFile, """
        <!ENTITY aaa "bbb">
        """);

final var xml = """
        <!DOCTYPE root SYSTEM "file:///R:/java-work/sample.dtd">
        <root>&aaa;</root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    factory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "all");

    final var ret = factory.getAttribute(XMLConstants.ACCESS_EXTERNAL_DTD);
    System.out.println(ret); // "all"

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]
    System.out.println(root.getTextContent()); // bbb
}

{
    factory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");

    final var ret = factory.getAttribute(XMLConstants.ACCESS_EXTERNAL_DTD);
    System.out.println(ret); // ""

    final var builder = factory.newDocumentBuilder();

    try {
        final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));
    } catch (SAXException e) {
        System.out.println(e);
    }

    // Result
    // ↓
    //org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 57; External DTD:
    // Failed to read external DTD 'sample.dtd', because 'file' access is not allowed due to
    // restriction set by the accessExternalDTD property.
}

abstract boolean getFeature (String name)

获取命名特征的状态。 (机器翻译)

// An example of the exponential entity expansion attack.
final var xml = """
        <!DOCTYPE root[
            <!ENTITY x100 "X">
            <!ENTITY x99 "&x100;&x100;">
            <!ENTITY x98 "&x99;&x99;">
            ...
            (omitted)
            ...
            <!ENTITY x3 "&x4;&x4;">
            <!ENTITY x2 "&x3;&x3;">
            <!ENTITY x1 "&x2;&x2;">
        ]>
        <root>&x1;</root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.getFeature(XMLConstants.FEATURE_SECURE_PROCESSING);
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();

    try {
        final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));
    } catch (SAXException e) {
        System.out.println(e);
    }

    // Result
    // ↓
    //org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; JAXP00010001:
    // The parser has encountered more than "64000" entity expansions in this document;
    // this is the limit imposed by the JDK.
}

factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, false);

{
    final var ret = factory.getFeature(XMLConstants.FEATURE_SECURE_PROCESSING);
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();

    // Warning! Entities are growing exponentially, so parsing it takes a very long time.
    //final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));
}

Schema getSchema ()

通过 setSchema(Schema schema) 方法获取指定的 Schema 对象。 (机器翻译)

final var xsd = """
        <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
            <xsd:element name="root" type="xsd:string"/>
        </xsd:schema>
        """;

final var schemaFactory = SchemaFactory.newDefaultInstance();
final var schema = schemaFactory.newSchema(
        new StreamSource(new ByteArrayInputStream(xsd.getBytes())));

final var factory = DocumentBuilderFactory.newInstance();

System.out.println(factory.getSchema()); // null

factory.setSchema(schema);
System.out.println(factory.getSchema().equals(schema)); // true

final var errorHandler = new DefaultHandler() {
    @Override
    public void error(SAXParseException e) {
        System.out.println("-- ErrorHandler error --");
        System.out.println(e);
    }
};

{
    final var xml = """
            <root>abcd</root>
            """;

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]
    System.out.println(root.getTextContent()); // abcd
}

{
    final var xml = """
            <root><child>abcd</child></root>
            """;

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    // Result
    // ↓
    //-- ErrorHandler error --
    //org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 33; cvc-type.3.1.2:
    // Element 'root' is a simple type, so it must have no element information item [children].
}

boolean isCoalescing ()

指示工厂是否配置为生成将 CDATA 节点转换为文本节点并将其附加到相邻(如果有)文本节点的解析器。 (机器翻译)

final var xml = """
        <root>aaa<![CDATA[<&>]]></root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isCoalescing();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa]
    //[#cdata-section: <&>]
}

factory.setCoalescing(true);

{
    final var ret = factory.isCoalescing();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa<&>]
}

boolean isExpandEntityReferences ()

指示工厂是否配置为生成扩展实体引用节点的解析器。 (机器翻译)

final var xml = """
        <!DOCTYPE root [
            <!ENTITY aaa "bbb">
        ]>
        <root>&aaa;</root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isExpandEntityReferences();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var child = root.getFirstChild();
    System.out.println(child); // [#text: bbb]
}

factory.setExpandEntityReferences(false);

{
    final var ret = factory.isExpandEntityReferences();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    if (root.getFirstChild() instanceof EntityReference entityReference) {
        System.out.println(entityReference); // [aaa: null]
    }
}

boolean isIgnoringComments ()

指示工厂是否配置为生成忽略注释的解析器。 (机器翻译)

final var xml = """
        <root>aaa<!--bbb--></root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isIgnoringComments();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa]
    //[#comment: bbb]
}

factory.setIgnoringComments(true);

{
    final var ret = factory.isIgnoringComments();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa]
}

boolean isIgnoringElementContentWhitespace ()

指示工厂是否配置为生成忽略元素内容中可忽略空格的解析器。 (机器翻译)

final var xml = """
        <!DOCTYPE root [
            <!ELEMENT child-a (dummy?)>
            <!ELEMENT child-b (#PCDATA)>
        ]>
        <root>
            <child-a> </child-a>
            <child-b> </child-b>
        </root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isIgnoringElementContentWhitespace();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var childA = document.getElementsByTagName("child-a").item(0);
    System.out.println(childA); // [child-a: null]

    final var childB = document.getElementsByTagName("child-b").item(0);
    System.out.println(childB); // [child-b: null]

    System.out.println(childA.getFirstChild()); // [#text:  ]
    System.out.println(childB.getFirstChild()); // [#text:  ]
}

factory.setIgnoringElementContentWhitespace(true);

{
    final var ret = factory.isIgnoringElementContentWhitespace();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var childA = document.getElementsByTagName("child-a").item(0);
    System.out.println(childA); // [child-a: null]

    final var childB = document.getElementsByTagName("child-b").item(0);
    System.out.println(childB); // [child-b: null]

    System.out.println(childA.getFirstChild()); // null
    System.out.println(childB.getFirstChild()); // [#text:  ]
}

boolean isNamespaceAware ()

指示工厂是否配置为生成可感知命名空间的解析器。 (机器翻译)

final var xml = """
        <ns:root xmlns:ns="sample">
            <ns:child/>
        </ns:root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isNamespaceAware();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var child = document.getElementsByTagNameNS("sample", "child").item(0);
    System.out.println(child); // null
}

factory.setNamespaceAware(true);

{
    final var ret = factory.isNamespaceAware();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var child = document.getElementsByTagNameNS("sample", "child").item(0);
    System.out.println(child); // [ns:child: null]
}

boolean isValidating ()

指示工厂是否配置为生成在解析期间验证 XML 内容的解析器。 (机器翻译)

// The XML document intentionally does not match the DTD.
final var xml = """
        <!DOCTYPE root [
            <!ELEMENT root (child-a)>
        ]>
        <root><child-z/></root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

final var errorHandler = new DefaultHandler() {
    @Override
    public void error(SAXParseException e) {
        System.out.println("-- ErrorHandler error --");
        System.out.println(e);
    }
};

{
    final var ret = factory.isValidating();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var childZ = document.getElementsByTagName("child-z").item(0);
    System.out.println(childZ); // [child-z: null]
}

factory.setValidating(true);

{
    final var ret = factory.isValidating();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    // Result
    // ↓
    //-- ErrorHandler error --
    //org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 17;
    // Element type "child-z" must be declared.
    //-- ErrorHandler error --
    //org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 24;
    // The content of element type "root" must match "(child-a)".
}

boolean isXIncludeAware ()

获取 XInclude 处理的状态。 (机器翻译)

final var sampleFile = Path.of("R:", "java-work", "sample.xml");
System.out.println(sampleFile); // R:\java-work\sample.xml

Files.writeString(sampleFile, """
        <child>abcd</child>
        """);

final var xml = """
        <root xmlns:xi="http://www.w3.org/2001/XInclude">
            <xi:include href="file:///R:/java-work/sample.xml" parse="xml" />
        </root>
        """;

final var factory = DocumentBuilderFactory.newNSInstance();

{
    final var ret = factory.isXIncludeAware();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var nodes = document.getElementsByTagName("child");
    System.out.println(nodes.getLength()); // 0
}

factory.setXIncludeAware(true);

{
    final var ret = factory.isXIncludeAware();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var child = document.getElementsByTagName("child").item(0);
    System.out.println(child); // [child: null]
    System.out.println(child.getTextContent()); // abcd
}

static DocumentBuilderFactory newDefaultInstance ()

创建 DocumentBuilderFactory 内置系统默认实现的新实例。 (机器翻译)

final var xml = """
        <root>
            <child-a>AAA</child-a>
            <child-b>BBB</child-b>
        </root>
        """;

final var factory = DocumentBuilderFactory.newDefaultInstance();
final var builder = factory.newDocumentBuilder();

final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

final var childA = document.getElementsByTagName("child-a").item(0);
System.out.println(childA); // [child-a: null]
System.out.println(childA.getTextContent()); // AAA

final var childB = document.getElementsByTagName("child-b").item(0);
System.out.println(childB); // [child-b: null]
System.out.println(childB.getTextContent()); // BBB

static DocumentBuilderFactory newDefaultNSInstance ()

创建 DocumentBuilderFactory 内置系统默认实现的新 NamespaceAware 实例。 (机器翻译)

final var nsFactory = DocumentBuilderFactory.newDefaultNSInstance();
System.out.println(nsFactory.isNamespaceAware()); // true

final var factory = DocumentBuilderFactory.newDefaultInstance();
System.out.println(factory.isNamespaceAware()); // false

abstract DocumentBuilder newDocumentBuilder ()

使用当前配置的参数创建 DocumentBuilder 的新实例。 (机器翻译)

final var xml = """
        <root>
            <child-a>AAA</child-a>
            <child-b>BBB</child-b>
        </root>
        """;

final var factory = DocumentBuilderFactory.newInstance();
final var builder = factory.newDocumentBuilder();

final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

final var childA = document.getElementsByTagName("child-a").item(0);
System.out.println(childA); // [child-a: null]
System.out.println(childA.getTextContent()); // AAA

final var childB = document.getElementsByTagName("child-b").item(0);
System.out.println(childB); // [child-b: null]
System.out.println(childB.getTextContent()); // BBB

static DocumentBuilderFactory newInstance ()

获取 DocumentBuilderFactory 的新实例。 (机器翻译)

final var xml = """
        <root>
            <child-a>AAA</child-a>
            <child-b>BBB</child-b>
        </root>
        """;

final var factory = DocumentBuilderFactory.newInstance();
final var builder = factory.newDocumentBuilder();

final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

final var childA = document.getElementsByTagName("child-a").item(0);
System.out.println(childA); // [child-a: null]
System.out.println(childA.getTextContent()); // AAA

final var childB = document.getElementsByTagName("child-b").item(0);
System.out.println(childB); // [child-b: null]
System.out.println(childB.getTextContent()); // BBB

static DocumentBuilderFactory newInstance (String factoryClassName, ClassLoader classLoader)

从类名中获取 DocumentBuilderFactory 的新实例。 (机器翻译)

This method is probably for third party libraries. Therefore, the code example is omitted.

static DocumentBuilderFactory newNSInstance ()

创建 DocumentBuilderFactory 的新 NamespaceAware 实例。 (机器翻译)

final var nsFactory = DocumentBuilderFactory.newNSInstance();
System.out.println(nsFactory.isNamespaceAware()); // true

final var factory = DocumentBuilderFactory.newInstance();
System.out.println(factory.isNamespaceAware()); // false

static DocumentBuilderFactory newNSInstance (String factoryClassName, ClassLoader classLoader)

根据类名创建 DocumentBuilderFactory 的新 NamespaceAware 实例。 (机器翻译)

This method is probably for third party libraries. Therefore, the code example is omitted.

abstract void setAttribute (String name, Object value)

允许用户在底层实现上设置特定属性。 (机器翻译)

final var dtdFile = Path.of("R:", "java-work", "sample.dtd");
System.out.println(dtdFile); // R:\java-work\sample.dtd

Files.writeString(dtdFile, """
        <!ENTITY aaa "bbb">
        """);

final var xml = """
        <!DOCTYPE root SYSTEM "file:///R:/java-work/sample.dtd">
        <root>&aaa;</root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    factory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "all");

    final var ret = factory.getAttribute(XMLConstants.ACCESS_EXTERNAL_DTD);
    System.out.println(ret); // "all"

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]
    System.out.println(root.getTextContent()); // bbb
}

{
    factory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");

    final var ret = factory.getAttribute(XMLConstants.ACCESS_EXTERNAL_DTD);
    System.out.println(ret); // ""

    final var builder = factory.newDocumentBuilder();

    try {
        final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));
    } catch (SAXException e) {
        System.out.println(e);
    }

    // Result
    // ↓
    //org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 57; External DTD:
    // Failed to read external DTD 'sample.dtd', because 'file' access is not allowed due to
    // restriction set by the accessExternalDTD property.
}

void setCoalescing (boolean coalescing)

指定此代码生成的解析器将 CDATA 节点转换为文本节点并将其附加到相邻的(如果有)文本节点。 (机器翻译)

final var xml = """
        <root>aaa<![CDATA[<&>]]></root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isCoalescing();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa]
    //[#cdata-section: <&>]
}

factory.setCoalescing(true);

{
    final var ret = factory.isCoalescing();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa<&>]
}

void setExpandEntityReferences (boolean expandEntityRef)

指定此代码生成的解析器将扩展实体引用节点。 (机器翻译)

final var xml = """
        <!DOCTYPE root [
            <!ENTITY aaa "bbb">
        ]>
        <root>&aaa;</root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isExpandEntityReferences();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var child = root.getFirstChild();
    System.out.println(child); // [#text: bbb]
}

factory.setExpandEntityReferences(false);

{
    final var ret = factory.isExpandEntityReferences();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    if (root.getFirstChild() instanceof EntityReference entityReference) {
        System.out.println(entityReference); // [aaa: null]
    }
}

abstract void setFeature (String name, boolean value)

为该 DocumentBuilderFactory 和由此工厂创建的 DocumentBuilders 设置一个功能。 (机器翻译)

// An example of the exponential entity expansion attack.
final var xml = """
        <!DOCTYPE root[
            <!ENTITY x100 "X">
            <!ENTITY x99 "&x100;&x100;">
            <!ENTITY x98 "&x99;&x99;">
            ...
            (omitted)
            ...
            <!ENTITY x3 "&x4;&x4;">
            <!ENTITY x2 "&x3;&x3;">
            <!ENTITY x1 "&x2;&x2;">
        ]>
        <root>&x1;</root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.getFeature(XMLConstants.FEATURE_SECURE_PROCESSING);
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();

    try {
        final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));
    } catch (SAXException e) {
        System.out.println(e);
    }

    // Result
    // ↓
    //org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; JAXP00010001:
    // The parser has encountered more than "64000" entity expansions in this document;
    // this is the limit imposed by the JDK.
}

factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, false);

{
    final var ret = factory.getFeature(XMLConstants.FEATURE_SECURE_PROCESSING);
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();

    // Warning! Entities are growing exponentially, so parsing it takes a very long time.
    //final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));
}

void setIgnoringComments (boolean ignoreComments)

指定此代码生成的解析器将忽略注释。 (机器翻译)

final var xml = """
        <root>aaa<!--bbb--></root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isIgnoringComments();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa]
    //[#comment: bbb]
}

factory.setIgnoringComments(true);

{
    final var ret = factory.isIgnoringComments();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]

    final var nodes = root.getChildNodes();
    System.out.println("-- nodes --");
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i));
    }

    // Result
    // ↓
    //-- nodes --
    //[#text: aaa]
}

void setIgnoringElementContentWhitespace (boolean whitespace)

指定此工厂创建的解析器在解析 XML 文档时必须消除元素内容中的空格(有时被宽泛地称为“可忽略空格”)(参见 XML Rec 2.10)。 (机器翻译)

final var xml = """
        <!DOCTYPE root [
            <!ELEMENT child-a (dummy?)>
            <!ELEMENT child-b (#PCDATA)>
        ]>
        <root>
            <child-a> </child-a>
            <child-b> </child-b>
        </root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isIgnoringElementContentWhitespace();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var childA = document.getElementsByTagName("child-a").item(0);
    System.out.println(childA); // [child-a: null]

    final var childB = document.getElementsByTagName("child-b").item(0);
    System.out.println(childB); // [child-b: null]

    System.out.println(childA.getFirstChild()); // [#text:  ]
    System.out.println(childB.getFirstChild()); // [#text:  ]
}

factory.setIgnoringElementContentWhitespace(true);

{
    final var ret = factory.isIgnoringElementContentWhitespace();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var childA = document.getElementsByTagName("child-a").item(0);
    System.out.println(childA); // [child-a: null]

    final var childB = document.getElementsByTagName("child-b").item(0);
    System.out.println(childB); // [child-b: null]

    System.out.println(childA.getFirstChild()); // null
    System.out.println(childB.getFirstChild()); // [#text:  ]
}

void setNamespaceAware (boolean awareness)

指定此代码生成的解析器将提供对 XML 命名空间的支持。 (机器翻译)

final var xml = """
        <ns:root xmlns:ns="sample">
            <ns:child/>
        </ns:root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

{
    final var ret = factory.isNamespaceAware();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var child = document.getElementsByTagNameNS("sample", "child").item(0);
    System.out.println(child); // null
}

factory.setNamespaceAware(true);

{
    final var ret = factory.isNamespaceAware();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var child = document.getElementsByTagNameNS("sample", "child").item(0);
    System.out.println(child); // [ns:child: null]
}

void setSchema (Schema schema)

设置由该工厂创建的解析器使用的模式。 (机器翻译)

final var xsd = """
        <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
            <xsd:element name="root" type="xsd:string"/>
        </xsd:schema>
        """;

final var schemaFactory = SchemaFactory.newDefaultInstance();
final var schema = schemaFactory.newSchema(
        new StreamSource(new ByteArrayInputStream(xsd.getBytes())));

final var factory = DocumentBuilderFactory.newInstance();

System.out.println(factory.getSchema()); // null

factory.setSchema(schema);
System.out.println(factory.getSchema().equals(schema)); // true

final var errorHandler = new DefaultHandler() {
    @Override
    public void error(SAXParseException e) {
        System.out.println("-- ErrorHandler error --");
        System.out.println(e);
    }
};

{
    final var xml = """
            <root>abcd</root>
            """;

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var root = document.getDocumentElement();
    System.out.println(root); // [root: null]
    System.out.println(root.getTextContent()); // abcd
}

{
    final var xml = """
            <root><child>abcd</child></root>
            """;

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    // Result
    // ↓
    //-- ErrorHandler error --
    //org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 33; cvc-type.3.1.2:
    // Element 'root' is a simple type, so it must have no element information item [children].
}

void setValidating (boolean validating)

指定此代码生成的解析器将在解析文档时对其进行验证。 (机器翻译)

// The XML document intentionally does not match the DTD.
final var xml = """
        <!DOCTYPE root [
            <!ELEMENT root (child-a)>
        ]>
        <root><child-z/></root>
        """;

final var factory = DocumentBuilderFactory.newInstance();

final var errorHandler = new DefaultHandler() {
    @Override
    public void error(SAXParseException e) {
        System.out.println("-- ErrorHandler error --");
        System.out.println(e);
    }
};

{
    final var ret = factory.isValidating();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var childZ = document.getElementsByTagName("child-z").item(0);
    System.out.println(childZ); // [child-z: null]
}

factory.setValidating(true);

{
    final var ret = factory.isValidating();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    builder.setErrorHandler(errorHandler);

    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    // Result
    // ↓
    //-- ErrorHandler error --
    //org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 17;
    // Element type "child-z" must be declared.
    //-- ErrorHandler error --
    //org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 24;
    // The content of element type "root" must match "(child-a)".
}

void setXIncludeAware (boolean state)

设置 XInclude 处理的状态。 (机器翻译)

final var sampleFile = Path.of("R:", "java-work", "sample.xml");
System.out.println(sampleFile); // R:\java-work\sample.xml

Files.writeString(sampleFile, """
        <child>abcd</child>
        """);

final var xml = """
        <root xmlns:xi="http://www.w3.org/2001/XInclude">
            <xi:include href="file:///R:/java-work/sample.xml" parse="xml" />
        </root>
        """;

final var factory = DocumentBuilderFactory.newNSInstance();

{
    final var ret = factory.isXIncludeAware();
    System.out.println(ret); // false

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var nodes = document.getElementsByTagName("child");
    System.out.println(nodes.getLength()); // 0
}

factory.setXIncludeAware(true);

{
    final var ret = factory.isXIncludeAware();
    System.out.println(ret); // true

    final var builder = factory.newDocumentBuilder();
    final var document = builder.parse(new ByteArrayInputStream(xml.getBytes()));

    final var child = document.getElementsByTagName("child").item(0);
    System.out.println(child); // [child: null]
    System.out.println(child.getTextContent()); // abcd
}

相关文章

To top of page