You can synchronize DataHub data to Hologres for scenarios that require real-time data analytics, real-time monitoring, real-time reporting, and complex data structure processing. By combining the capabilities of DataHub and Hologres, you can improve the efficiency and accuracy of data processing and analytics. This topic describes how to create a DataHub data sync task and provides answers to frequently asked questions (FAQ).
Background information
DataHub provides a data sink and source feature, also known as the data synchronization feature. This feature supports real-time or near-real-time synchronization of data from a topic to other Alibaba Cloud products. This enables data flow between Alibaba Cloud products. For more information about the DataHub data synchronization feature, see Overview.
The following table describes the data type mapping between DataHub and Hologres.
|
DataHub |
Hologres |
|
Project |
Database |
|
Topic |
Table |
Synchronization scenarios and policies
Synchronization scenarios
|
Synchronization scenario |
Description |
|
Row-by-row insertion |
Inserts data from the source database into the destination database row by row. This scenario is suitable for a full synchronization of DataHub data to Hologres. |
|
Replay |
Analyzes and re-executes binary logging (binlog) records of database operations. This applies the source database changes to the destination database to ensure data consistency. This scenario is suitable for data links where DTS synchronizes data to DataHub, and then DataHub synchronizes the data to Hologres. In this case, DataHub acts as the binlog. Note
When DTS synchronizes data to DataHub, it adds extra columns to record data operation information. The naming of these extra columns differs between old and new versions. For more information, see Modify the rules for extra columns in data synchronization. |
Synchronization policies
|
Synchronization policy |
Description |
|
Overwrite (replace) |
If a primary key conflict occurs during a data write, the new data overwrites the old data. This ensures that the data in the destination database is consistent with the source data. |
|
Ignore (ignore) |
If a primary key conflict occurs during a data write, the new data is ignored and the existing data is not updated. This prevents the import and overwriting of duplicate data and maintains data integrity in the destination database. |
Precautions
-
Only data of the TUPLE type in DataHub can be synchronized to Hologres.
-
Before you write data to a partitioned table, you must create a partition child table in Hologres. For more information, see CREATE PARTITION TABLE.
-
Each sync task uses a number of connections equal to the number of shards in the DataHub topic.
Preparations
-
Activate DataHub and prepare your DataHub data. For more information, see Quick Start (Synchronization Example).
-
Purchase a Hologres instance and create a table. This topic uses a table named
lineitemas an example. For more information, see Purchase a Hologres instance and Connect to Hologres using HoloWeb to create a table.The following table describes the data type mapping between DataHub and Hologres.
DataHub
Hologres
TINYINT
SMALLINT
SMALLINT
SMALLINT
INTEGER
INTEGER
BIGINT
BIGINT
FLOAT
REAL
DOUBLE
DOUBLE PRECISION
DECIMAL
DECIMAL
STRING
TEXT
BOOLEAN
BOOLEAN
TIMESTAMP
TIMESTAMPTZ
The following sample statement shows how to create the table.
BEGIN; CREATE TABLE lineitem ( L_ORDERKEY BIGINT NOT NULL, L_PARTKEY BIGINT NOT NULL, L_SUPPKEY BIGINT NOT NULL, L_LINENUMBER BIGINT NOT NULL, L_QUANTITY DECIMAL(20,10), L_EXTENDEDPRICE DECIMAL(20,10), L_DISCOUNT DECIMAL(20,10), L_TAX DECIMAL(20,10), L_RETURNFLAG TEXT, L_LINESTATUS TEXT, L_SHIPDATE TIMESTAMPTZ, L_COMMITDATE TIMESTAMPTZ, L_RECEIPTDATE TIMESTAMPTZ, L_SHIPINSTRUCT TEXT, L_SHIPMODE TEXT, L_COMMENT TEXT ); caLL set_table_property('lineitem', 'orientation', 'column'); COMMIT;
Create a sync task
-
Log on to the DataHub console. Click the topic that you created to open its details page.
-
In the upper-right corner of the topic details page, click + Sync.
-
Click Hologres and on the New Connector page, configure the parameters.
Parameter
Description
Instance
The ID of the Hologres instance. Go to the Hologres console to obtain the Instance ID.
Database
The name of the Hologres database that will receive the data from DataHub.
Table
The name of the Hologres table that will receive the data from DataHub, such as
lineitem.Primary key conflict policy
The policy for handling data updates when a primary key conflict occurs. Valid values:
-
replace: If a primary key conflict occurs during a data write, the new data overwrites the old data. This is the default value.
-
ignore: If a primary key conflict occurs during a data write, the new data is ignored and the existing data is not updated.
For more information about data synchronization policies, see Synchronization policies.
Synchronization scenario
The scenario for data synchronization. Valid values:
-
default: Row-by-row insertion. This is the default value.
-
dts: Select this value if you use DTS to synchronize data to DataHub and the new rule for extra columns is enabled.
-
dts_old: Select this value if you use DTS to synchronize data to DataHub and the new rule for extra columns is not enabled.
For more information about data synchronization scenarios, see Synchronization scenarios.
Import fields
The fields to import into Hologres. You can select some or all fields to import as needed.
Authentication mode
The default value is AccessKey.
AccessKey ID
The AccessKey ID used to access the Hologres instance. Click AccessKey Management to obtain your AccessKey ID.
AccessKey Secret
The AccessKey secret used to access the Hologres instance. Click AccessKey Management to obtain your AccessKey secret.
Timestamp Unit
The unit of the synchronization time. Valid values:
-
MICROSECOND: Microsecond. This is the default value.
-
MILLISECOND: Millisecond.
-
SECOND: Second.
-
-
Click Create to synchronize data from DataHub to Hologres.
After creating a connector, you can view the real-time data synchronization status in the Sync Tasks section on the topic details page.
-
Query data in Hologres.
You can connect to a Hologres instance using a developer tool to query the synchronized data in real time. For more information about how to connect to Hologres, see Connect to Hologres. Run the following sample query statement.
SELECT COUNT(*) FROM lineitem;
FAQ
This topic describes common errors that may occur when you use Hologres and provides troubleshooting information to help you resolve them.
-
Problem 1
-
Error message
ErrorMessage: Import field not found in dest schema. -
Cause
-
The Hologres table does not contain the import fields specified in the sync task.
-
The synchronization scenario for the task is set to default, but the import fields include extra columns that are generated when DTS synchronizes data to DataHub.
-
-
Solution
-
Recreate the Hologres table and add the missing import fields. Alternatively, modify the import fields in the sync task to remove the fields that do not exist in the Hologres table.
-
Recreate the sync task and set the synchronization scenario to dts or dts_old.
-
-
-
Problem 2
-
Error message
ErrorMessage: Column type not match with Holo column. -
Cause
The data types of the fields in the DataHub topic do not match the data types of the fields in the Hologres table.
-
Solution
Recreate the Hologres table and set the correct data types for the fields based on the data type mapping.
-
-
Problem 3
-
Error message
ErrorMessage: Not import column xxx not allow null and no default value. -
Cause
Some fields in the Hologres table are not included in the import fields of the sync task. However, these fields have the `NOT NULL` attribute and do not have a default value.
-
Solution
Recreate the Hologres table. For fields that are not included in the import fields of the sync task, either remove the `NOT NULL` attribute or set a default value.
-