When you initialize an ApsaraDB RDS for PostgreSQL instance, you can configure the collation of each database based on your business requirements. The collation includes the string sort order, character classification method, numeric value format, date and time format, and currency format. In addition, you may also need to configure the LC_COLLATE and LC_CTYPE environment variables.

LC_COLLATEString sort order
LC_CTYPECharacter classification
LC_MESSAGESMessage language
LC_MONETARYCurrency format
LC_NUMERICNumeric value format
LC_TIMEDate and time format

You can configure these environment variables to specify a collation that meets your business requirements in a locale.

Supported character sets

For more information, see Character Set Support.

NameDescriptionLanguageServerBytes/CharAliases
BIG5Big FiveTraditional ChineseNo1-2WIN950, Windows950
EUC_CNExtended UNIX Code-CNSimplified ChineseYes1-3-
EUC_JPExtended UNIX Code-JPJapaneseYes1-3-
EUC_JIS_2004Extended UNIX Code-JP, JIS X 0213JapaneseYes1-3-
EUC_KRExtended UNIX Code-KRKoreanYes1-3-
EUC_TWExtended UNIX Code-TWTraditional Chinese, TaiwaneseYes1-3-
GB18030National StandardChineseNo1-4-
GBKExtended National StandardSimplified ChineseNo1-2WIN936, Windows936
ISO_8859_5ISO 8859-5, ECMA 113Latin/CyrillicYes1-
ISO_8859_6ISO 8859-6, ECMA 114Latin/ArabicYes1-
ISO_8859_7ISO 8859-7, ECMA 118Latin/GreekYes1-
ISO_8859_8ISO 8859-8, ECMA 121Latin/HebrewYes1-
JOHABJOHABKorean (Hangul)No1-3-
KOI8RKOI8-RCyrillic (Russian)Yes1KOI8
KOI8UKOI8-UCyrillic (Ukrainian)Yes1-
LATIN1ISO 8859-1, ECMA 94Western EuropeanYes1ISO88591
LATIN2ISO 8859-2, ECMA 94Central EuropeanYes1ISO88592
LATIN3ISO 8859-3, ECMA 94South EuropeanYes1ISO88593
LATIN4ISO 8859-4, ECMA 94North EuropeanYes1ISO88594
LATIN5ISO 8859-9, ECMA 128TurkishYes1ISO88599
LATIN6ISO 8859-10, ECMA 144NordicYes1ISO885910
LATIN7ISO 8859-13BalticYes1ISO885913
LATIN8ISO 8859-14CelticYes1ISO885914
LATIN9ISO 8859-15LATIN1 with Euro and accentsYes1ISO885915
LATIN10ISO 8859-16, ASRO SR 14111RomanianYes1ISO885916
MULE_INTERNALMule internal codeMultilingual EmacsYes1-4-
SJISShift JISJapaneseNo1-2Mskanji, ShiftJIS, WIN932, Windows932
SHIFT_JIS_2004Shift JIS, JIS X 0213JapaneseNo1-2-
SQL_ASCIIunspecified (see text)anyYes1-
UHCUnified Hangul CodeKoreanNo1-2WIN949, Windows949
UTF8Unicode, 8-bitallYes1-4Unicode
WIN866Windows CP866CyrillicYes1ALT
WIN874Windows CP874ThaiYes1-
WIN1250Windows CP1250Central EuropeanYes1-
WIN1251Windows CP1251CyrillicYes1WIN
WIN1252Windows CP1252Western EuropeanYes1-
WIN1253Windows CP1253GreekYes1-
WIN1254Windows CP1254TurkishYes1-
WIN1255Windows CP1255HebrewYes1-
WIN1256Windows CP1256ArabicYes1-
WIN1257Windows CP1257BalticYes1-
WIN1258Windows CP1258VietnameseYes1ABC, TCVN, TCVN5712, VSCII

LC_COLLATE and LC_CTYPE settings supported by a character set

You can execute the following SQL statement to query the LC_COLLATE and LC_CTYPE settings that are supported by a character set from the pg_collation system table:

test=> select pg_encoding_to_char(collencoding) as encoding,collname,collcollate,collctype from pg_collation ;

If the encoding field of a collation is empty, the collation supports all character sets.

  encoding  |       collname        |      collcollate      |       collctype
------------+-----------------------+-----------------------+-----------------------
            | default               |                       |
            | C                     | C                     | C
            | POSIX                 | POSIX                 | POSIX
 UTF8       | aa_DJ                 | aa_DJ.utf8            | aa_DJ.utf8
 LATIN1     | aa_DJ                 | aa_DJ                 | aa_DJ
 LATIN1     | aa_DJ.iso88591        | aa_DJ.iso88591        | aa_DJ.iso88591
 UTF8       | aa_DJ.utf8            | aa_DJ.utf8            | aa_DJ.utf8
 UTF8       | aa_ER                 | aa_ER                 | aa_ER
 UTF8       | aa_ER.utf8            | aa_ER.utf8            | aa_ER.utf8
.......
 EUC_CN     | zh_CN                 | zh_CN                 | zh_CN
 UTF8       | zh_CN                 | zh_CN.utf8            | zh_CN.utf8
 EUC_CN     | zh_CN.gb2312          | zh_CN.gb2312          | zh_CN.gb2312
 UTF8       | zh_CN.utf8            | zh_CN.utf8            | zh_CN.utf8
 UTF8       | zh_HK                 | zh_HK.utf8            | zh_HK.utf8
 UTF8       | zh_HK.utf8            | zh_HK.utf8            | zh_HK.utf8
 EUC_CN     | zh_SG                 | zh_SG                 | zh_SG
 UTF8       | zh_SG                 | zh_SG.utf8            | zh_SG.utf8
 EUC_CN     | zh_SG.gb2312          | zh_SG.gb2312          | zh_SG.gb2312
 UTF8       | zh_SG.utf8            | zh_SG.utf8            | zh_SG.utf8
 EUC_TW     | zh_TW                 | zh_TW.euctw           | zh_TW.euctw
 UTF8       | zh_TW                 | zh_TW.utf8            | zh_TW.utf8
 EUC_TW     | zh_TW.euctw           | zh_TW.euctw           | zh_TW.euctw
 UTF8       | zh_TW.utf8            | zh_TW.utf8            | zh_TW.utf8
 UTF8       | zu_ZA                 | zu_ZA.utf8            | zu_ZA.utf8
 LATIN1     | zu_ZA                 | zu_ZA                 | zu_ZA
 LATIN1     | zu_ZA.iso88591        | zu_ZA.iso88591        | zu_ZA.iso88591
 UTF8       | zu_ZA.utf8            | zu_ZA.utf8            | zu_ZA.utf8
(869 rows)

Configure the collation of a database in a locale

  • Configure the fields of a database in a locale

    Prerequisites

    Familiarize yourself with the collations that are supported by the character set of the database. Then, execute the following SQL statement to query the encoding format of the database:

    postgres=# select datname,pg_encoding_to_char(encoding) as encoding from pg_database;

    Information similar to the following output is returned:

          datname       | encoding
    --------------------+-----------
     template1          | UTF8
     template0          | UTF8
     db                 | SQL_ASCII
     db1                | EUC_CN
     contrib_regression | UTF8
     test01             | UTF8
     test02             | UTF8
     postgres           | UTF8
    (8 rows)

    Procedure

    1. When you create a table, execute the following SQL statement to specify a collation that is supported by the character set of the database:
      CREATE TABLE test1 (
       a text COLLATE "de_DE",
       b text COLLATE "es_ES",
       ...
      );
    2. Execute the following SQL statement to modify the collation of a column:
      Note When you modify the collation of a column in a table, the table is rewritten. Proceed with caution if the table is large.
      alter table a alter c1 type text COLLATE "zh_CN";
  • Configure locale settings.
    • Execute the following SQL statement to change the sort order that is specified by the ORDER BY clause:
      test=# select * from a order by c1 collate "C";  

      Information similar to the following output is returned:

           c1
        --------
         Tom
         Alice
        (2 rows)
        test=# select * from a order by c1 collate "zh_CN";
           c1
        --------
         Alice
         Tom
        (2 rows)
    • Execute the following SQL statement to change the result that is returned from an operator:

      Example 1:

      select * from a where c1 > 'Tom' collate "C";  

      Information similar to the following output is returned:

           c1
        --------
         Alice
        (1 row)

      Example 2:

      select * from a where c1 > 'Tom' collate "en_US";

      Information similar to the following output is returned:

         c1
        ----
        (0 rows)
  • Sort data by using locale indexes.

    You can sort data by using an index only when the collation specified in the ORDER BY clause is the same as the collation of the index. Execute the following SQL statement:

    create index idxa on a(c1 collate "zh_CN");  
    explain select * from a order by c1 collate "zh_CN";                      

    Information similar to the following output is returned:

                                   QUERY PLAN
    ------------------------------------------------------------------------
     Index Only Scan using idxa on a  (cost=0.15..31.55 rows=1360 width=64)
    (1 row)

Configure a rule to sort results in alphabetical order

You can use one of the following four methods to configure a rule that is used to sort results in alphabetical order:

  • Use the SQL statements that are supported in your locale. This method does not require you to modify the original data. Execute the following SQL statement:
    select * from a order by c1 collate "zh_CN";  

    Information similar to the following output is returned:

         c1
      --------
       Alice
       Tom
      (2 rows)
  • Use the fields that are supported in your locale. If the database contains data, this method requires you to modify the original data. Execute the following SQL statement:
    alter table a alter c1 type text COLLATE "zh_CN";
  • Use the indexes and SQL statements that are supported in your locale. This method does not require you to modify the original data. Execute the following SQL statements:
    create index idxa on a(c1 collate "zh_CN");  
    explain select * from a order by c1 collate "zh_CN";  

    Information similar to the following output is returned:

                                     QUERY PLAN
      ------------------------------------------------------------------------
       Index Only Scan using idxa on a  (cost=0.15..31.55 rows=1360 width=64)
      (1 row)
  • Set the collation of the database to en_US. By default, the data in this database is sorted in alphabetical order based on the specified collation. Execute the following SQL statement:
    create database test03 encoding 'UTF8' lc_collate 'zh_CN.utf8' lc_ctype 'zh_CN.utf8'  template template0;
    \c test03
    select * from (values ('Alice'),('Tom')) as a(c1) order by c1 ; 

    Information similar to the following output is returned:

         c1
      --------
       Alice
       Tom
      (2 rows)
    Note A Chinese character may have more than one pronunciation. For example, the Chongqing city in China may be encoded as the Zhongqing city. Proceed with caution if you want to configure a collation that is used to sort Chinese characters based on pronunciations.

Configure a rule to sort results in alphabetical order by using Greenplum

Greenplum does not allow you to specify collations for individual columns. Therefore, the sorting of results in alphabetical order is different in Greenplum.

You can use Greenplum to convert results among character sets. Then, you can sort the results in binary order. This allows you to obtain similar results that resemble results in alphabetical order. The following code snippet provides an example:

select * from (values ('Alice'), ('Tom')) t(id) order by byteain(textout(convert(id,'UTF8','EUC_CN')));

Information similar to the following output is returned:

   id
--------
 Alice
 Tom
(2 rows)

References

PostgreSQL 15 Documentation - Chapter 23. Localization