This topic describes the key points that you must consider before you migrate an Oracle database to a PolarDB cluster. Carefully read this topic before the migration to ensure a smooth migration process.
Sequence value synchronization
Question
Are sequence values in the source Oracle database automatically synchronized to the destination PolarDB cluster during the migration? How do I synchronize the sequence values from the Oracle database to the PolarDB cluster?
Analysis
PolarDB does not automatically synchronize sequence values from a source database. PolarDB fully supports the logical replication feature defined by the PostgreSQL protocol. However, sequence values are not included in the logical replication scope of the PostgreSQL protocol.
During the initial schema migration, a one-time snapshot of the sequence values (Q1 for example) at the time of the migration is synchronized to the PolarDB cluster. Data continues to be added or updated in the Oracle database after the initial migration. The sequence values in the Oracle database may increase to Q2, while the sequence values in the PolarDB cluster remain at Q1. However, the table data in the destination PolarDB cluster is synchronized with the source Oracle database in real time.
If you use a value in Q1 to insert a new record into the PolarDB cluster, a duplicate key error occurs because the same sequence number has been used in the Oracle database.
Solutions
Solution 1: Use Data Transmission Service (DTS) to synchronize sequences
On the Increment Write page of the DTS instance, click Synchronize Sequence. This way, DTS generates a sequence synchronization statement to synchronize sequence values. For more information, see Update the values of sequences in the destination database.
Solution 2: Manually synchronize sequence values
For information about how to manually synchronize sequence values from an Oracle database to a PolarDB cluster, see Migrate data from a self-managed Oracle database to a PolarDB for PostgreSQL (Compatible with Oracle) cluster.
Encoding settings
Question
The Oracle database may use GBK or UTF-8 encoding. Which encoding do I use for the PolarDB cluster?
Solution
An Oracle database may use the GBK or UTF-8 character set. To migrate an Oracle database to a PolarDB cluster, we recommend that you specify the UTF-8 character set for the PolarDB cluster. The UTF-8 character set has better compatibility and provides the following advantages:
Avoids the risk of encoding conflicts: We recommend that you do not use GBK as the character set of the PolarDB cluster. GBK has specific characters that share the same encoding range as ASCII codes. This may result in encoding errors during data migration.
Prevents garbled code insertion: Oracle allows the insertion of invalid GBK encoding, which may result in garbled text when the data is read back. PolarDB does not support the insertion of invalid GBK encoding. If you try to insert invalid GBK encoding into PolarDB, the data may be rejected.
Therefore, we recommend that you use UTF-8 encoding to ensure a smooth and error-free migration from the Oracle database to the PolarDB cluster.
Case sensitivity of table names
Question
By default, table names and column names are in uppercase in an Oracle database and in lowercase in a PolarDB cluster. How do I address the differences in case sensitivity when I migrate data from an Oracle database to a PolarDB cluster?
Solution
By default, Oracle converts all unquoted names, such as schema names, table names, and column names, to uppercase. By default, PolarDB for PostgreSQL (Compatible with Oracle) converts all unquoted names to lowercase. When you migrate data from an Oracle database to a PolarDB for PostgreSQL (Compatible with Oracle) cluster, DTS uses the following default solution based on your business requirements:
Convert unquoted names from uppercase to lowercase
Applicable scenario: Schema names, table names, and column names in the Oracle database are not quoted.
Solution: DTS automatically converts all unquoted names to lowercase during the migration to the PolarDB for PostgreSQL (Compatible with Oracle) cluster. After the migration, you can continue to use the names without double quotation marks. No changes or special configurations are required.
Keep the original case of quoted names
Applicable scenario: Table and column names use mixed case and are enclosed in double quotation marks in the Oracle database.
Solution: DTS keeps the original case of the names during the migration. Make sure that you disable the specific mappings. After migration to the PolarDB for PostgreSQL (Compatible with Oracle) cluster, you can continue to use the names with double quotation marks. No changes or special configurations are required.
Allow case-insensitive queries
By default, you can query a table whose name is in uppercase in a PolarDB cluster by using the table name in uppercase or lowercase.
CHAR and VARCHAR data length semantics
Question
Does VARCHAR(20) specify the same data length in Oracle and PolarDB?
Analysis
The VARCHAR(20) data type is differently implemented in Oracle and PolarDB.
In an Oracle database, the
CHARandVARCHARdata types specify lengths in bytes.In a PolarDB for PostgreSQL (Compatible with Oracle) cluster, the
CHARandVARCHARdata types specify lengths in characters by default.The PolarDB for PostgreSQL (Compatible with Oracle) cluster allows you to use the
polar_default_char_length_semanticsparameter to change the default length semantics ofCHARandVARCHARdata types. By default, the parameter value is OFF and the length of CHAR and VARCHAR data types is measured in characters. If you set the parameter to ON, the length of CHAR and VARCHAR data types is measured in bytes.NoteYou can configure the
polar_default_char_length_semanticsparameter in the PolarDB console. For more information, see Configure cluster parameters.
Solution
When the
polar_default_char_length_semanticsparameter is set toON:Scenario: In the Oracle database, a
CHAR(10)column uses the GBK character. In the PolarDB for PostgreSQL (Compatible with Oracle) cluster, the correspondingCHAR(10)column uses the UTF-8 character set.Issue: During the forward synchronization from the Oracle database to the PolarDB for PostgreSQL (Compatible with Oracle) cluster, a string "测试测试测" is written into the CHAR(10) column in the Oracle database. The string occupies
2 × 5 = 10bytes, which is exactly the maximum number of bytes of the column. However, the string "测试测试测" occupies 15 characters in the PolarDB for PostgreSQL (Compatible with Oracle) cluster, which exceeds the maximum number of bytes of the column. An error is reported.Solutions
Solution 1: Set the
polar_default_char_length_semanticsparameter to OFF to ensure that no related errors occur during the migration.Solution 2: Change the length of the column in the PolarDB for PostgreSQL (Compatible with Oracle) cluster to a larger size.
When the
polar_default_char_length_semanticsparameter is set toOFF:Scenario: In the Oracle database, a
CHAR(10)column uses the GBK character. In the PolarDB for PostgreSQL (Compatible with Oracle) cluster, the correspondingCHAR(10)column uses the UTF-8 character set.Issue: During the reverse synchronization from the PolarDB for PostgreSQL (Compatible with Oracle) cluster to the Oracle database, a string "测试" is written into the CHAR(10) column in the PolarDB for PostgreSQL (Compatible with Oracle) cluster. The actual number of occupied characters is
2 + 8 = 10. When the string is converted to bytes, the string occupies2 × 3 + 8 = 14bytes, which exceeds the length of the CHAR(10) column in the Oracle database. A data truncation error is reported.Solutions
Solution 1: Change the length of the column in the Oracle database to a larger size.
Solution 2: Configure an extract, transform, load (ETL) task in DTS to truncate the excess characters. For more information, see Configure ETL in a data migration or synchronization task.
Table without a primary key or unique key
Question
What do I do if a table without a primary key or unique key exists in the Oracle database? Is data validation affected?
Solution
If a table does not have a primary key or unique key, data consistency cannot be guaranteed during the migration process, which may result in duplicate entries in the destination cluster. You can use one of the following solutions based on your business requirements:
If data consistency is not important to your business, or you plan to perform deduplication after the migration, you can proceed with the migration process.
If data consistency is important to your business, you can use the
ROWIDcolumn in the Oracle database as the hidden primary key (non-empty unique key) of the destination cluster. Log on to the DTS console and set the Add a hidden primary key for a table without a primary key/a table without a unique key parameter to Yes. For more information, see Migrate data from a self-managed Oracle database to a PolarDB for PostgreSQL (Compatible with Oracle) cluster. After the migration task is complete, click the corresponding button to delete the hidden primary key. Then, you can perform data validation. Take note that tables without a primary key do not support data validation.
\0 characters handling
Question
How do I handle hidden \0 characters in the Oracle database?
Solution
PolarDB for PostgreSQL (Compatible with Oracle) does not support \0 characters. DTS removes \0 characters during the migration process.
If a common field contains \0 characters, the removal of the characters may result in data discrepancies before and after the migration.
If a primary key field contains
\0characters, the removal of the characters may lead to data conflicts. For example, you have two records with primary key valuesdts\0anddts. After the migration, the two key values are transformed to dts, which results in a data conflict.
Handing of empty strings
Question
In an Oracle database, an empty string ('') is treated as a NULL value. Does PolarDB support this behavior?
Solution
According to the SQL standard, an empty string is distinct from a NULL value. However, Oracle treats empty strings as NULL values. PolarDB supports this behavior of Oracle by using the polar_empty_string_is_null_enable parameter. By default, the parameter is set to ON and empty strings are treated as NULL values in a PolarDB cluster. If you no longer need the feature, you can disable it in the PolarDB console. For more information, see Configure cluster parameters.
Time zone settings
Question
How do I set the time zone for a PolarDB cluster? What are the impacts of the time zone setting?
Solution
By default, the time zone value of a PolarDB cluster is UTC. UTC stands for Coordinated Universal Time. All timestamps stored or retrieved from the cluster are based on UTC.
For example, you query the current data and time of the cluster at 18:13:34 on December 3, 2024 (UTC+8).
SELECT * FROM now();The following result is returned:
now
--------------------------------
2024-12-03 10:13:34.018557 +00
(1 row)If you want to use the UTC+8 time in a PolarDB cluster, set the timezone parameter to PRC in the PolarDB console. For more information, see Configure cluster parameters. Query the current time.
SELECT * FROM now();The following result is returned:
now
--------------------------------
2024-12-03 18:14:34.841027 +08
(1 row)INTEGER data type precision
Question
In an Oracle database, the INTEGER type is NUMBER(38), which can store integers up to 38 digits. In a PolarDB for PostgreSQL (Compatible with Oracle)cluster, the INTEGER type is a 32-bit integer. How do I address the difference in INTEGER data type precision between the source Oracle database and the destination PolarDB cluster?
Solution
To prevent out of integer errors in the PolarDB cluster, replace CAST(val AS INTEGER) in queries with CAST(val AS NUMBER(38)).
In the following example, the number 99999999999 exceeds the 32-bit integer limit of the PolarDB for PostgreSQL (Compatible with Oracle) cluster. This results in an out of integer error on the PolarDB for PostgreSQL (Compatible with Oracle) cluster. In contrast, the Oracle database can correctly handle the number.
SELECT CAST(99999999999 AS INTEGER) FROM dual;Sample result for the Oracle database:
CAST(99999999999ASINTEGER) _____________________________ 99999999999Sample result for the PolarDB for PostgreSQL (Compatible with Oracle) cluster:
ERROR: integer out of range
Change the query in the PolarDB for PostgreSQL (Compatible with Oracle) cluster.
SELECT CAST(99999999999 AS NUMBER(38)) FROM dual;Sample result for the PolarDB for PostgreSQL (Compatible with Oracle) cluster:
numeric
-------------
99999999999
(1 row)