All Products
Search
Document Center

Hologres:Synchronize data from DataHub to Hologres in real time

Last Updated:Feb 20, 2024

You can synchronize data from DataHub to Hologres in scenarios that require real-time data analytics, real-time monitoring, real-time reporting, or complex data type processing. The combined capabilities of DataHub and Hologres help improve the efficiency and accuracy of data processing and analytics. This topic describes how to create a DataConnector in DataHub and provides solutions to common errors.

Background information

DataHub supports data synchronization as a data source or a data sink. You can synchronize data from a DataHub topic to other Alibaba Cloud services in real time or quasi real time. This facilitates data flows among Alibaba Cloud services. For more information about the data synchronization feature of DataHub, see Overview.

The following table describes the mappings between concepts in DataHub and Hologres.

DataHub

Hologres

Project

Database

Topic

Table

Synchronization modes and policies

Synchronization modes

Mode

Description

One-by-one insert

Inserts DataHub data into Hologres one by one. This mode is applicable if you want to synchronize all data in DataHub to Hologres.

Playback

Performs change operations that are performed on the source DataHub project on the destination Hologres database to ensure data consistency. You can obtain the data change operations by analyzing binary logs. This mode is applicable if you want to synchronize data from a source to DataHub by using Data Transmission Service (DTS) and then synchronize the data to Hologres. In this case, DataHub is equivalent to binary logs.

Note

When DTS is used to synchronize data to DataHub, additional columns are added to the synchronized data columns to describe the data operations. The names of additional columns may conflict with the names of existing columns in the destination DataHub topic. For more information about how to prevent the conflicts, see Modify the naming rules for additional columns.

Synchronization policies

Policy

Description

Replace

If a primary key conflict occurs when you write data, the original data is overwritten by the new data. This ensures data consistency between the source and destination databases.

Ignore

If a primary key conflict occurs when you write data, new data is ignored. The data is not updated. This prevents repeated data from being written and ensures data integrity of the destination database.

Usage notes

  • Only data in topics of the TUPLE type in DataHub can be synchronized to Hologres.

  • Before you write data to a partitioned table in the Hologres instance, you must create a child table. For more information, see CREATE PARTITION TABLE.

  • Each DataConnector occupies connections. The number of connections occupied by each DataConnector is the number of shards in the DataHub topic.

Preparations

  1. Activate DataHub and prepare a data source in DataHub. For more information, see Get started with DataHub.

  2. Purchase a Hologres instance and create a table. In this example, the table lineitem is used. For more information, see Purchase a Hologres instance and Connect to HoloWeb.

    The following table describes the mappings between data types in DataHub and Hologres.

    DataHub

    Hologres

    TINYINT

    SMALLINT

    SMALLINT

    SMALLINT

    INTEGER

    INTEGER

    BIGINT

    BIGINT

    FLOAT

    REAL

    DOUBLE

    DOUBLE PRECISION

    DECIMAL

    DECIMAL

    STRING

    TEXT

    BOOLEAN

    BOOLEAN

    TIMESTAMP

    TIMESTAMPTZ

    For example, you can execute the following statements to create a table:

    BEGIN;
    CREATE TABLE lineitem ( 
    L_ORDERKEY BIGINT NOT NULL,
    L_PARTKEY BIGINT NOT NULL,
    L_SUPPKEY BIGINT NOT NULL,
    L_LINENUMBER BIGINT NOT NULL,
    L_QUANTITY DECIMAL(20,10),
    L_EXTENDEDPRICE DECIMAL(20,10),
    L_DISCOUNT DECIMAL(20,10),
    L_TAX DECIMAL(20,10),
    L_RETURNFLAG TEXT,
    L_LINESTATUS TEXT,
    L_SHIPDATE TIMESTAMPTZ,
    L_COMMITDATE TIMESTAMPTZ,
    L_RECEIPTDATE TIMESTAMPTZ,
    L_SHIPINSTRUCT TEXT,
    L_SHIPMODE TEXT,
    L_COMMENT TEXT
    );
    caLL set_table_property('lineitem', 'orientation', 'column');
    COMMIT;

Create a DataConnector

  1. Log on to the DataHub console. Click the created topic to go to the topic details page.

  2. In the upper-right corner, click + Connector.

  3. In the panel that appears, click Hologres. In the Create Connector panel, configure the parameters.

    Parameter

    Description

    Instance

    The ID of the Hologres instance. You can log on to the Hologres console and copy the ID of the required Hologres instance.

    Database

    The name of the Hologres database to which you want to synchronize DataHub data.

    Table

    The name of the Hologres table to which you want to synchronize DataHub data. In this example, the table lineitem is used.

    Primary Key Conflict Policy

    The data update policy if a primary key conflict occurs. Valid values:

    • replace: If a primary key conflict occurs when you write data, the original data is overwritten by the new data. This is the default value.

    • ignore: If a primary key conflict occurs when you write data, new data is ignored. The data is not updated, and original data is still used.

    For more information about data synchronization policies, see Synchronization policies in this topic.

    Synchronization Scenarios

    The mode that is used for data synchronization. Valid values:

    • default: Data is inserted one by one. This is the default value.

    • dts: Use this mode when you use DTS to synchronize data to DataHub and enable new naming rules for additional columns.

    • dts_old: Use this mode when you use DTS to synchronize data to DataHub and do not enable new naming rules for additional columns.

    For more information about data synchronization modes, see Synchronization modes in this topic.

    Import Fields

    The fields to be synchronized to Hologres. You can synchronize all or part of the fields of the DataHub topic based on your business requirements.

    Authentication Mode

    The mode in which access to the Hologres instance is authenticated. Default value: AccessKey.

    AccessKey ID

    The AccessKey ID of the Alibaba Cloud account used to access the Hologres instance. You can obtain the AccessKey ID from the Security Management page.

    AccessKey Secret

    The AccessKey secret of the Alibaba Cloud account used to access the Hologres instance. You can obtain the AccessKey secret on the Security Management page.

    Timestamp Unit

    The timestamp unit used by the DataConnector for data synchronization. Valid values:

    • MICROSECOND: This is the default value.

    • MILLISECOND.

    • SECOND.

  4. Click Create to synchronize data from DataHub to Hologres.

    After you create the DataConnector, you can view the status of the DataConnector on the Synchronization Task tab of the topic details page.

  5. Query the synchronized data in Hologres.

    Connect the Hologres instance to a development tool and use the tool to check whether the data is synchronized to the Hologres instance in real time. For more information about development tools, see Overview. For example, you can execute the following statement to query the synchronized data:

    SELECT COUNT(*) FROM lineitem;

Common errors and troubleshooting

This section describes common errors that are reported when you use Hologres and provides the solutions to these errors.

  • Error 1

    • Error message

      ErrorMessage: Import field not found in dest schema.
    • Cause

      • The destination Hologres table does not contain the fields that are specified as import fields in the DataConnector.

      • The Synchronization Scenarios parameter is set to default, but the additional columns that are generated when data is synchronized to DataHub by using DTS are configured as import fields in the DataConnector.

    • Solution

      • Create a Hologres table and add the missing fields. You can also modify the Import Fields parameter of the DataConnector to remove the additional fields.

      • Create a DataConnector and set the Synchronization Scenarios parameter to dts or dts_old.

  • Error 2

    • Error message

      ErrorMessage: Column type not match with Holo column.
    • Cause

      The data types of fields in the DataHub topic do not match the data types of fields in the Hologres table.

    • Solution

      Create a Hologres table and specify valid data types for the table based on the data type mappings.

  • Error 3

    • Error message

      ErrorMessage: Not import column xxx not allow null and no default value.
    • Cause

      The destination Hologres table contains some fields that are not specified as import fields in the DataConnector. The fields are configured with the not null property, and no default values are configured for the fields.

    • Solution

      Create a Hologres table. For fields that are not specified as import fields in the DataConnector, configure default values for the fields or do not configure the not null property.