All Products
Search
Document Center

PolarDB:XML type

Last Updated:Mar 28, 2024

This topic describes the definition and syntax of the XML type.

The XML type can be used to store XML data. The advantage of the XML type over storing XML data in a text field is that the XML type checks whether the input values are well-formed and provides support functions that are used to perform type-safe operations on the input values. To use the XML type, make sure that the configure --with-libxml option is configured during the installation of PolarDB for PostgreSQL (Compatible with Oracle).

The XML type can store well-formed documents that are defined based on the XML standard, and content fragments that are defined by reference to the more permissive document node of the XQuery and XPath data model. This means that content fragments can have more than one top-level element or character node. The expression xmlvalue`` IS DOCUMENT can be used to evaluate whether a specific XML value is a full document or only a content fragment.

Generate an XML value

To generate a value of the XML type from character data, you can use the XMLPARSE function in the following syntax:

    XMLPARSE ( { DOCUMENT | CONTENT } value)

Example:

    XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter></book>')
    XMLPARSE (CONTENT 'abc<foo>bar</foo><bar>foo</bar>')

This is the only way to convert character strings into XML values based on the SQL standard. However, PolarDB for PostgreSQL (Compatible with Oracle) provides the following specific syntax:

    xml '<foo>bar</foo>'
    '<foo>bar</foo>'::xml

The preceding syntax can also be used to convert character strings into XML values.

The XML type does not verify an input value against a document type declaration (DTD), even if the input value specifies a DTD. No built-in support is available for verification based on other XML schema languages, such as XML Schema.

To generate character strings from XML values, you can use the XMLSERIALIZE function in the following syntax:

    XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )

type can be character, character varying, or text, or an alias of character, character varying, or text. Based on the SQL standard, this is the only way to convert XML values into character strings. However, PolarDB for PostgreSQL (Compatible with Oracle) also allows you to simply cast the values.

If a character string is cast to or from an XML value without using the XMLPARSE or XMLSERIALIZE function, the XML option session configuration parameter specifies whether to use DOCUMENT or CONTENT. You can run the following standard command to configure the parameter:

    SET XML OPTION { DOCUMENT | CONTENT };

Alternatively, you can run the following PolarDB for PostgreSQL (Compatible with Oracle)-specific command to configure this parameter:

    SET xmloption TO { DOCUMENT | CONTENT };

The default value is CONTENT, which indicates that all forms of XML data are allowed.

Encode XML data

You must carefully process multiple character encodings on the client, server, or in the XML data that passes through them. If the text mode is used to pass queries to the server and query results to the client, which is the normal mode, PolarDB for PostgreSQL (Compatible with Oracle) converts all character data that is passed between the client and server to the character encoding of the destination. This includes string representations of XML values. The preceding section provides an example. In most cases, this means that encoding declarations that are contained in XML data may become invalid because the character data is converted to other encodings when the character data is passed between the client and server, and the embedded encoding declaration is not changed. To process this behavior, the system ignores the encoding declarations that are contained in character strings presented for input to the XML type, and assumes that content is in the current server encoding. Next, for correct processing, character strings of XML data must be sent from the client in the current client encoding. The client converts documents to the current client encoding before the client sends the documents to the server, or appropriately adjusts the client encoding. Upon output, XML values do not have an encoding declaration, and the client assumes that all data is in the current client encoding.

If the binary mode is used to pass query parameters to the server and query results back to the client, no encoding conversion is performed. In this case, an encoding declaration in the XML data is observed. If the encoding declaration is missing, the data is assumed to be in UTF-8. This is required by the XML standard. Note that PolarDB for PostgreSQL (Compatible with Oracle) does not support UTF-16. Upon output, data has an encoding declaration that specifies the client encoding unless the client encoding is UTF-8. If the client encoding is UTF-8, the encoding declaration is omitted.

It is less error-prone and more efficient to process XML data with PolarDB for PostgreSQL (Compatible with Oracle) if the XML data encoding, client encoding, and server encoding are the same. Because XML data is internally processed in UTF-8, computations are the most efficient if the server encoding is also UTF-8.

Important

Some XML-related functions may not be applicable to non-ASCII data if the server encoding is not UTF-8. Especially, this is known to be an issue for xmltable() and xpath().

Access XML values

The XML type does not provide comparison operators. This is because no well-defined and general comparison algorithm is available for XML data. As a result, you cannot retrieve rows by comparing an XML column against a search value. Therefore, in most cases, XML values are accompanied by a separate key field such as an ID field. Alternatively, you can compare XML values by converting the XML values into character strings first. However, character string comparison is not useful for the XML comparison method.

No comparison operators are available for the XML type. Therefore, you cannot directly create an index on an XML column. If you want to query XML data in a quick manner, you can use one of the following possible solutions: Cast the expression to a character string and index the character string, or index an XPath expression. Of course, the actual query must be adjusted to use the indexed expression.

The text search feature of PolarDB for PostgreSQL (Compatible with Oracle) can also be used to accelerate full-text searches of XML data. However, the necessary preprocessing support is unavailable in the PolarDB for PostgreSQL (Compatible with Oracle) distribution.