Different languages order text differently: German sorts accented letters alongside their unaccented counterparts, while Swedish places them at the end of the alphabet. Configuring the right collation ensures that queries and ORDER BY results reflect the language expectations of your application.
On ApsaraDB RDS for PostgreSQL, you can set collations at three levels:
Database level — set when you create the database; applies to all columns by default
Column level — override the database default for individual columns
Query level — override the column collation for a single query or comparison
Locale variables
PostgreSQL uses POSIX locale categories to control locale-sensitive behavior. The following categories are most relevant to text handling:
| Variable | Controls |
|---|---|
LC_COLLATE | String sort order |
LC_CTYPE | Character classification (what counts as a letter, and what is its uppercase equivalent) |
LC_MESSAGES | Message language |
LC_MONETARY | Currency format |
LC_NUMERIC | Numeric value format |
LC_TIME | Date and time format |
Supported character sets
See Character Set Support in the PostgreSQL documentation.
Query supported collations
Run the following query to see which collations are available for a given character set. The results come from the pg_collation system table.
select pg_encoding_to_char(collencoding) as encoding,collname,collcollate,collctype from pg_collation ;If the encoding column is empty, the collation supports all character sets.
Sample output (869 rows total):
encoding | collname | collcollate | collctype
------------+-----------------------+-----------------------+-----------------------
| default | |
| C | C | C
| POSIX | POSIX | POSIX
UTF8 | aa_DJ | aa_DJ.utf8 | aa_DJ.utf8
LATIN1 | aa_DJ | aa_DJ | aa_DJ
LATIN1 | aa_DJ.iso88591 | aa_DJ.iso88591 | aa_DJ.iso88591
UTF8 | aa_DJ.utf8 | aa_DJ.utf8 | aa_DJ.utf8
UTF8 | aa_ER | aa_ER | aa_ER
UTF8 | aa_ER.utf8 | aa_ER.utf8 | aa_ER.utf8
.......
EUC_CN | zh_CN | zh_CN | zh_CN
UTF8 | zh_CN | zh_CN.utf8 | zh_CN.utf8
EUC_CN | zh_CN.gb2312 | zh_CN.gb2312 | zh_CN.gb2312
UTF8 | zh_CN.utf8 | zh_CN.utf8 | zh_CN.utf8
UTF8 | zh_HK | zh_HK.utf8 | zh_HK.utf8
UTF8 | zh_HK.utf8 | zh_HK.utf8 | zh_HK.utf8
EUC_CN | zh_SG | zh_SG | zh_SG
UTF8 | zh_SG | zh_SG.utf8 | zh_SG.utf8
EUC_CN | zh_SG.gb2312 | zh_SG.gb2312 | zh_SG.gb2312
UTF8 | zh_SG.utf8 | zh_SG.utf8 | zh_SG.utf8
EUC_TW | zh_TW | zh_TW.euctw | zh_TW.euctw
UTF8 | zh_TW | zh_TW.utf8 | zh_TW.utf8
EUC_TW | zh_TW.euctw | zh_TW.euctw | zh_TW.euctw
UTF8 | zh_TW.utf8 | zh_TW.utf8 | zh_TW.utf8
UTF8 | zu_ZA | zu_ZA.utf8 | zu_ZA.utf8
LATIN1 | zu_ZA | zu_ZA | zu_ZA
LATIN1 | zu_ZA.iso88591 | zu_ZA.iso88591 | zu_ZA.iso88591
UTF8 | zu_ZA.utf8 | zu_ZA.utf8 | zu_ZA.utf8
(869 rows)Set collations at the column level
Prerequisites
Before you begin, ensure that you have:
An ApsaraDB RDS for PostgreSQL instance
The encoding of the target database (run the query below to check)
Query the encoding of your databases:
select datname,pg_encoding_to_char(encoding) as encoding from pg_database;Sample output:
datname | encoding
--------------------+-----------
template1 | UTF8
template0 | UTF8
db | SQL_ASCII
db1 | EUC_CN
contrib_regression | UTF8
test01 | UTF8
test02 | UTF8
postgres | UTF8
(8 rows)Identify the encoding of your database, then use pg_collation to confirm which collations that encoding supports.
Assign collations when creating a table
Specify a collation per column in CREATE TABLE:
CREATE TABLE test1 (
a text COLLATE "de_DE",
b text COLLATE "es_ES"
);Change the collation of an existing column
Use ALTER TABLE to update a column's collation:
alter table test1 alter a type text COLLATE "en_US";When you modify the collation of a column in a table, the table is rewritten. Proceed with caution if the table is large.
Override collation at query time
Override the collation for a single ORDER BY clause without changing table or column definitions:
select * from <table_name> order by <column_name> collate "C";Override collation in a comparison operator:
select * from <table_name> where <column_name> > 'Tom' collate "C";Use a collation index for better performance
An index can accelerate collation-based sorts, but only when the collation in the ORDER BY clause matches the collation of the index. Create the index with a specific collation:
create index idxa on <table_name>(<column_name> collate "en_US");
explain select * from <table_name> order by <column_name> collate "en_US";Sample output:
QUERY PLAN
------------------------------------------------------------------------
Index Only Scan using idxa on <table_name> (cost=0.15..31.55 rows=1360 width=64)
(1 row)Sort results in alphabetical order
Four methods are available. Choose based on whether you can modify the data or the schema.
| Method | Modifies data? | Modifies schema? | Best for |
|---|---|---|---|
Per-query COLLATE in ORDER BY | No | No | One-off queries |
Column-level collation (ALTER TABLE) | Yes (table rewrite) | Yes | Permanent, column-scoped sorting |
Collation index + ORDER BY | No | Yes (index only) | High-frequency sorted queries |
| Database-level collation at creation | No | Yes (database-wide) | Uniform sorting across all columns |
Method 1: Per-query collation (no schema or data changes)
select * from <table_name> order by <column_name> collate "en_US";Method 2: Column-level collation (permanent, requires table rewrite)
alter table <table_name> alter <column_name> type text COLLATE "en_US";Method 3: Collation index (no data changes, index only)
create index idxa on <table_name>(<column_name> collate "en_US");
explain select * from <table_name> order by <column_name> collate "en_US";Sample output:
QUERY PLAN
------------------------------------------------------------------------
Index Only Scan using idxa on <table_name> (cost=0.15..31.55 rows=1360 width=64)
(1 row)Method 4: Database-level collation at creation (applies to all columns)
Set lc_collate and lc_ctype when creating the database. All text data in the database is then sorted by that collation by default.
create database test03 encoding 'UTF8' lc_collate 'en_US.utf8' lc_ctype 'en_US.utf8' template template0;
\c test03
select * from (values ('Alice'),('Tom')) as a(c1) order by c1 ;Sample output:
c1
--------
Alice
Tom
(2 rows)Chinese characters can have multiple pronunciations. For example, the city name Chongqing may be sorted under "Z" (Zhongqing) in some pronunciation-based collations. Exercise caution when using pronunciation-based collations for Chinese text.
Sort alphabetically in Greenplum
Greenplum does not support per-column collation settings. As a workaround, convert strings to a target encoding and sort in binary order to approximate alphabetical results:
select * from (values ('Alice'), ('Tom')) t(id) order by byteain(textout(convert(id,'UTF8','EUC_CN')));Sample output:
id
--------
Alice
Tom
(2 rows)