Configure the collation of a database - ApsaraDB RDS - Alibaba Cloud Documentation Center

When you initialize an ApsaraDB RDS for PostgreSQL instance, you can configure the collation of each database based on your business requirements. The collation includes the string sort order, character classification method, numeric value format, date and time format, and currency format. In addition, you may also need to configure the LC_COLLATE and LC_CTYPE environment variables.

LC_COLLATE	String sort order
LC_CTYPE	Character classification
LC_MESSAGES	Message language
LC_MONETARY	Currency format
LC_NUMERIC	Numeric value format
LC_TIME	Date and time format

You can configure these environment variables to specify a collation that meets your business requirements in a locale. This topic provides an example on how to configure the collation of a database in a locale and how to sort the output by a specific order.

Supported character sets

For more information, see Character Set Support.

LC_COLLATE and LC_CTYPE settings supported by a character set

You can execute the following SQL statement to query the LC_COLLATE and LC_CTYPE settings that are supported by a character set from the pg_collation system table:

select pg_encoding_to_char(collencoding) as encoding,collname,collcollate,collctype from pg_collation ;

If the encoding field of a collation is empty, the collation supports all character sets.

  encoding  |       collname        |      collcollate      |       collctype
------------+-----------------------+-----------------------+-----------------------
            | default               |                       |
            | C                     | C                     | C
            | POSIX                 | POSIX                 | POSIX
 UTF8       | aa_DJ                 | aa_DJ.utf8            | aa_DJ.utf8
 LATIN1     | aa_DJ                 | aa_DJ                 | aa_DJ
 LATIN1     | aa_DJ.iso88591        | aa_DJ.iso88591        | aa_DJ.iso88591
 UTF8       | aa_DJ.utf8            | aa_DJ.utf8            | aa_DJ.utf8
 UTF8       | aa_ER                 | aa_ER                 | aa_ER
 UTF8       | aa_ER.utf8            | aa_ER.utf8            | aa_ER.utf8
.......
 EUC_CN     | zh_CN                 | zh_CN                 | zh_CN
 UTF8       | zh_CN                 | zh_CN.utf8            | zh_CN.utf8
 EUC_CN     | zh_CN.gb2312          | zh_CN.gb2312          | zh_CN.gb2312
 UTF8       | zh_CN.utf8            | zh_CN.utf8            | zh_CN.utf8
 UTF8       | zh_HK                 | zh_HK.utf8            | zh_HK.utf8
 UTF8       | zh_HK.utf8            | zh_HK.utf8            | zh_HK.utf8
 EUC_CN     | zh_SG                 | zh_SG                 | zh_SG
 UTF8       | zh_SG                 | zh_SG.utf8            | zh_SG.utf8
 EUC_CN     | zh_SG.gb2312          | zh_SG.gb2312          | zh_SG.gb2312
 UTF8       | zh_SG.utf8            | zh_SG.utf8            | zh_SG.utf8
 EUC_TW     | zh_TW                 | zh_TW.euctw           | zh_TW.euctw
 UTF8       | zh_TW                 | zh_TW.utf8            | zh_TW.utf8
 EUC_TW     | zh_TW.euctw           | zh_TW.euctw           | zh_TW.euctw
 UTF8       | zh_TW.utf8            | zh_TW.utf8            | zh_TW.utf8
 UTF8       | zu_ZA                 | zu_ZA.utf8            | zu_ZA.utf8
 LATIN1     | zu_ZA                 | zu_ZA                 | zu_ZA
 LATIN1     | zu_ZA.iso88591        | zu_ZA.iso88591        | zu_ZA.iso88591
 UTF8       | zu_ZA.utf8            | zu_ZA.utf8            | zu_ZA.utf8
(869 rows)

Configure the collation of a database in a locale

Configure the fields of a database in a locale
Prerequisites
Familiarize yourself with the collations that are supported by the character set of the database. Then, execute the following SQL statement to query the encoding format of the database:
```
select datname,pg_encoding_to_char(encoding) as encoding from pg_database;
```
Sample output
```
      datname       | encoding
--------------------+-----------
 template1          | UTF8
 template0          | UTF8
 db                 | SQL_ASCII
 db1                | EUC_CN
 contrib_regression | UTF8
 test01             | UTF8
 test02             | UTF8
 postgres           | UTF8
(8 rows)
```
Procedure
1. When you create a table, execute the following SQL statement to specify a collation that is supported by the character set of the database:
```
CREATE TABLE test1 (
 a text COLLATE "de_DE",
 b text COLLATE "es_ES"
);
```
2. Execute the following SQL statement to modify the collation of a column:
  Note
  - When you modify the collation of a column in a table, the table is rewritten. Proceed with caution if the table is large.
  - In the following example, test1 indicates the name of the table, and a indicates the name of the column for which you want to modify the collation settings.
```
alter table test1 alter a type text COLLATE "en_US";
```
Configure locale settings.
- Change the sort order that is specified by the ORDER BY clause. Sample statement:
```
select * from <table_name> order by <column_name> collate "C";  
```
- Change the result that is returned from an operator. Example
  Sample statement
```
select * from <table_name> where <column_name> > 'Tom' collate "C";  
```

Sort data by using locale indexes.

You can sort data by using an index only when the collation specified in the ORDER BY clause is the same as the collation of the index. Sample statement:

create index idxa on <table_name>(<column_name> collate "en_US");  
explain select * from <table_name> order by <column_name> collate "en_US";

Sample output:

                               QUERY PLAN
------------------------------------------------------------------------
 Index Only Scan using idxa on <table_name> (cost=0.15..31.55 rows=1360 width=64)
(1 row)

Configure a rule to sort results in alphabetical order

You can use one of the following four methods to configure a rule that is used to sort results in alphabetical order:

Use the SQL statements that are supported in your locale. This method does not require you to modify the original data. Sample statement:
```
select * from <table_name> order by <column_name> collate "en_US";  
```
Use the fields that are supported in your locale. If the database contains data, this method requires you to modify the original data. Sample statement:
```
alter table <table_name> alter <column_name> type text COLLATE "en_US";
```

Use the indexes and SQL statements that are supported in your locale. This method does not require you to modify the original data. Sample statement:

create index idxa on <table_name>(<column_name> collate "en_US");  
explain select * from <table_name> order by <column_name> collate "en_US";

Sample output:

                                 QUERY PLAN
  ------------------------------------------------------------------------
   Index Only Scan using idxa on <table_name>  (cost=0.15..31.55 rows=1360 width=64)
  (1 row)

Set the collation of the database to en_US. By default, the data in this database is sorted in alphabetical order based on the specified collation. Sample statement:
```
create database test03 encoding 'UTF8' lc_collate 'en_US.utf8' lc_ctype 'en_US.utf8'  template template0;
\c test03
select * from (values ('Alice'),('Tom')) as a(c1) order by c1 ; 
```
Sample output:
```
     c1
  --------
   Alice
   Tom
  (2 rows)
```
Note
A Chinese character may have more than one pronunciation. For example, the Chongqing city in China may be encoded as the Zhongqing city. Proceed with caution if you want to configure a collation that is used to sort Chinese characters based on pronunciations.

Configure a rule to sort results in alphabetical order by using Greenplum

Greenplum does not allow you to specify collations for individual columns. Therefore, the sorting of results in alphabetical order is different in Greenplum.

You can use Greenplum to convert results among character sets. Then, you can sort the results in binary order. This allows you to obtain similar results that resemble results in alphabetical order. Sample statement:

select * from (values ('Alice'), ('Tom')) t(id) order by byteain(textout(convert(id,'UTF8','EUC_CN')));

Sample output:

   id
--------
 Alice
 Tom
(2 rows)

References

PostgreSQL 15 Documentation - Chapter 23. Localization