AnalyticDB for PostgreSQL provides multiple methods to import data. For more information, see Introduction to data migration and synchronization solutions.

Special characters can cause import errors when you import data in parallel by using OSS or when you import data by using the \COPY command. This topic describes how to pre-process special characters in data files to be imported to avoid errors caused by these special characters.

Method to pre-process special characters when you import data in parallel by using OSS

When you import data in parallel by using OSS, each row is processed as a tuple. You must specify a delimiter in each row to separate data in each column. This section describes how to specify delimiters when you create an OSS external table, the constraints of delimiters, and how to pre-process different types of special characters within columns when you import data in parallel by using OSS.

Delimiters

In the syntax to create an OSS external table, you can specify DELIMITER after the FORMAT clause, as shown in the following example:

FORMAT 'TEXT' (DELIMITER ',')
  • For FORMAT 'TEXT', DELIMITER is set to '\t' by default.
  • For FORMAT 'CSV', DELIMITER is set to ',' by default.

When you create an OSS external table, you can also define a custom delimiter that meets the following constraints:

  • A delimiter must be a single ASCII character. It cannot be a string of two or more ASCII characters.
  • '\n' and '\r' are not supported.
  • Escape characters other than '\n' and '\r' are supported but must be preceded with "E" or "e" when they are used.
  • The escape character '\t' is supported if it is not preceded by "E".
  • For TEXT files, you can set DELIMITER to OFF and create single-column external tables.

Data in your OSS files must be separated by specified delimiters to ensure that the data can be read.

Special characters in data

The following section describes scenarios where special characters exist in data to be imported and corresponding pre-processing methods.

  • A column contains the same character as a delimiter.

    • You must precede each delimiter with an escape character in TEXT files. You can execute the following statement to specify the escape character. The default escape character is a backslash (\).

      FORMAT 'TEXT' (ESCAPE '\' )
    • You must precede each delimiter with a double quotation mark (") in CSV files.
  • A column contains Chinese characters. OSS external tables support Chinese characters. However, to ensure that data is correctly displayed, you must set the encoding format to UTF-8 when you create an OSS external table.

    ENCODING 'UTF8'
  • A column contains null data. You can define a matching character for null, so that the specified character is identified as null during data import. For CSV files, the default value is a null value with no quotation marks. For TEXT files, the default value is \N. The following statement maps space to null. If the column is a space, then the value for the column is null in the data imported from the OSS file.

    FORMAT 'text' (null ' ' )
  • A column contains escape characters. You must precede escape characters with the "ESCAPE" word. You can specify the escape character when you create an external table. The default escape character is a double quotation mark (") in CSV files and a backslash (\) in TEXT files.

    • You can set ESCAPE to a single character. You can use the following statement to set ESCAPE to a backslash (\):

      FORMAT 'csv' (ESCAPE '\' )
    • You can also set ESCAPE to OFF to prevent automatic escape for all characters.
  • A column contains single quotation marks or double quotation marks.

    • You must precede a single quotation mark or double quotation mark with the "ESCAPE" word in TEXT files. The default escape character is a backslash (\).
    • You must precede a single quotation mark or double quotation mark with the "ESCAPE" word in CSV files. The default escape character is a double quote ("). You must also enclose the whole column in beginning and ending double quotation marks.

Method to pre-process special characters when you import data by using the \COPY command

When you use the \COPY command to import data, you can specify delimiters and pre-process special characters in the same way as when you import data in parallel by using OSS. However, the \COPY command and the CREATE EXTERNAL TABLE statement are used in slightly different ways. For more information about how to use the \COPY command, see Use the \COPY command.