This topic describes how to import data from an ApsaraDB RDS for MySQL instance to an ApsaraDB for ClickHouse cluster.

Procedure

  1. Configure a whitelist for the ApsaraDB RDS for MySQL instance.
    Add the CIDR block of the VPC where the ApsaraDB for ClickHouse cluster is located to the whitelist of the ApsaraDB RDS for MySQL instance.
    Note This method is not applicable if the ApsaraDB for ClickHouse cluster is located in a different VPC from the ApsaraDB RDS for MySQL instance. For more information, see Use DataWorks to synchronize data.
    1. Obtain the VPC ID of the ApsaraDB for ClickHouse cluster in the ApsaraDB for ClickHouse console.
      14600601
    2. Obtain the CIDR block of the VPC in the VPC console.
      14600602
    3. Add the CIDR block of the VPC to the whitelist of the ApsaraDB RDS for MySQL instance.
  2. Connect to the ApsaraDB for ClickHouse cluster. For more information, see Connect to a cluster.
  3. Create an ApsaraDB for ClickHouse table based on the structure of the ApsaraDB RDS for MySQL table.
    The following table describes data type mapping between ApsaraDB RDS for MySQL and ApsaraDB for ClickHouse.
    MySQL ClickHouse
    UNSIGNED TINYINT UInt8
    TINYINT Int8
    UNSIGNED SMALLINT UInt16
    SMALLINT Int16
    UNSIGNED INT, UNSIGNED MEDIUMINT UInt32
    INT, MEDIUMINT Int32
    UNSIGNED BIGINT UInt64
    BIGINT Int64
    FLOAT Float32
    DOUBLE Float64
    DATE Date
    DATETIME, TIMESTAMP DateTime
    BINARY FixedString
    Note The data types of ApsaraDB RDS for MySQL not listed in the table are converted to the STRING type of ApsaraDB for ClickHouse. If NOT NULL is not specified for a column in the SQL statement to create the ApsaraDB RDS for MySQL table, the value of the column can be NULL. You must specify Nullable for the corresponding column in the SQL statement to create the ApsaraDB for ClickHouse table.
    If the following SQL statement is used to create an ApsaraDB RDS for MySQL table:
    CREATE TABLE testdb.mysql_test_table (
      id int NOT NULL,
      quarter tinyint unsigned DEFAULT NULL,
      month tinyint DEFAULT NULL,
      day_of_month smallint unsigned DEFAULT NULL,
      day_of_week smallint DEFAULT NULL,
      airline_id int DEFAULT NULL,
      carrier float DEFAULT NULL,
      origin double DEFAULT NULL,
      unique_carrier varchar(80) NOT NULL,
      flight_date date NOT NULL,
      tail_date datetime DEFAULT NULL,
      origin_airport_time timestamp,
      comment varchar(100)
    ) ENGINE=InnoDB
    The following SQL statement can be used to create an ApsaraDB for ClickHouse table:
    -- Create a local table.
    create table default.clickhouse_test_table ON CLUSTER default (
      id Int32,
      quarter Nullable(UInt32),
      month Nullable(Int8),
      day_of_month Nullable(UInt16),
      day_of_week Nullable(Int16),
      airline_id Nullable(Int32),
      carrier Nullable(Float32),
      origin Nullable(Float64),
      unique_carrier String,
      flight_date Date,
      tail_date Nullable(Datetime),
      origin_airport_time Nullable(Datetime),
      comment Nullable(String)
    ) ENGINE = ReplicatedMergeTree(
        '/clickhouse/tables/{database}/{table}/{shard}',
        '{replica}',
        flight_date,
        (id, unique_carrier),
        8192);
    
    -- Create a distributed table.
    CREATE TABLE clickhouse_test_table_distributed ON CLUSTER default
     AS clickhouse_test_table
    ENGINE = Distributed(default, default, clickhouse_test_table, rand());
  4. Import data.
    insert into <table_name> select * 
    from mysql('<host>:<port>', '<db_name>','<table_name>', '<username>', '<password>')

    The following SQL statement can be used to import the data between tables created in the preceding examples:

    insert into clickhouse_test_table_distributed select * from mysql('<host>:<port>', 'test_db', 'mysql_test_table', '<username>', '<password>');
  5. Query data.
    select count(*) from <table_name>

FAQ

What do I do if the ApsaraDB RDS for MySQL instance cannot be connected and errors such as timeout are reported?
Possabile causes for common failed connections to the ApsaraDB RDS for MySQL instance:
  • The public endpoint of the ApsaraDB RDS for MySQL instance is used. The ApsaraDB for ClickHouse cluster cannot access the public endpoint of the ApsaraDB RDS for MySQL instance.
  • The provided endpoint of the ApsaraDB RDS for MySQL instance is not in the same VPC as the ApsaraDB for ClickHouse cluster. An ApsaraDB RDS for MySQL instance in another VPC cannot be connected.
  • The whitelist of the ApsaraDB RDS for MySQL instance is not configured. View the VPC ID of the ApsaraDB for ClickHouse cluster in the ApsaraDB for ClickHouse console and . Add the CIDR block to the whitelist of the ApsaraDB RDS for MySQL instance. You can also obtain the accurate IP addresses of ClickHouse servers by querying the system.clusters table of ApsaraDB for ClickHouse, and then add the IP addresses to the whitelist of the ApsaraDB RDS for MySQL instance.
  • If you are using a MySQL cluster built on an ECS instance, check whether the corresponding ECS security group allows the packets from the ApsaraDB for ClickHouse cluster.