All Products
Search
Document Center

ApsaraDB for SelectDB:Migrate data from a self-managed Doris database to ApsaraDB for SelectDB

Last Updated:Aug 13, 2025

You can migrate data from an Apache Doris database to an ApsaraDB for SelectDB instance using Object Storage Service (OSS) or catalogs. You can choose a migration method based on your business scenario. This topic describes how to migrate offline data from an Apache Doris database to an ApsaraDB for SelectDB instance and provides rules for method selection.

Method selection

Select a data migration method based on your business scenario. The following table describes the available methods and their use cases.

Method

Scenario

Reason

Operation

Export and import through OSS

Your data is stored on the Alibaba Cloud platform, and the data volume is large.

  • Avoids data transfer fees for migration.

    Note

    The Doris database and the SelectDB instance are in the same region. The migration uses internal network traffic.

  • Supports large-volume data migration.

Use an OSS bucket to migrate data

Catalog

Your data is stored on the Alibaba Cloud platform, and the data volume is small.

Note

This includes scenarios such as data in Alibaba Cloud EMR clusters.

  • Avoids data transfer fees for migration.

    Note

    The Doris database and the SelectDB instance are in the same region. The migration uses internal network traffic.

  • Does not require external components.

Use a catalog to migrate data

This topic describes how to use an OSS bucket or a catalog to migrate offline data from a Doris database to SelectDB.

Use an OSS bucket to migrate data

Prerequisites

  • OSS requirements

  • Cluster requirements

    A database account is created. For more information, see Account management.

Preparations

Create a bucket in the OSS console. For more information, see Create buckets in the console.

Important

Make sure that the OSS bucket and your ApsaraDB for SelectDB instance are in the same region.

In this topic, the bucket is named `test-selectdb`.

Step 1: Export source data

  1. Log on to the source cluster.

    For more information about how to log on to a self-managed Doris database, see Connecting by MySQL Protocol - Apache Doris.

  2. Optional: Prepare sample data.

    If you already have data to migrate, you can skip this step.

    1. Create a database.

      Run the following statement to create a database.

      CREATE DATABASE source_db;
    2. Create a table.

      Run the following statement to create a table.

      CREATE TABLE IF NOT EXISTS source_tb (
          `c1` int(11) NULL,
          `c2` string NULL,
          `c3` bigint NULL
      )
      DISTRIBUTED BY HASH(c1) BUCKETS 20
      PROPERTIES("replication_num" = "1");
    3. Insert sample data.

      INSERT INTO source_tb VALUES
          (1, 'doris', 18),
          (2, 'nereids', 20),
          (3, 'pipelibe', 99999),
          (4, 'Apache', 122123455),
          (5, null, null);
    4. Verify the data.

      Run the following statement to verify the sample data.

      SELECT * FROM `source_tb` limit 10;

      The following result is returned.

      +--------------+--------------+--------------+
      | c1           | c2           | c3           |
      +--------------+--------------+--------------+
      | 1            | doris        | 18           |
      | 3            | pipelibe     | 99999        |
      | 5            |              |              |
      | 2            | nereids      | 20           |
      | 4            | Apache       | 122123455    |
      +--------------+--------------+--------------+
  3. Back up the `CREATE TABLE` statement for the table that contains the data to be migrated.

    You can run the SHOW CREATE TABLE statement to view and back up the `CREATE TABLE` statement for the source table. The following example shows how to run the statement.

    SHOW CREATE TABLE source_tb ;

    The following result is returned.

    +-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Table           | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
    +-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | source_tb       | CREATE TABLE `source_tb` (
                            `c1` int NULL,
                            `c2` text NULL,
                            `c3` bigint NULL
                          ) ENGINE=OLAP
                          DUPLICATE KEY(`c1`)
                          DISTRIBUTED BY HASH(`c1`) BUCKETS 20
                          PROPERTIES (
                          "file_cache_ttl_seconds" = "0",
                          "is_being_synced" = "false",
                          "storage_medium" = "hdd",
                          "storage_format" = "V2",
                          "inverted_index_storage_format" = "V2",
                          "light_schema_change" = "true",
                          "disable_auto_compaction" = "false",
                          "enable_single_replica_compaction" = "false",
                          "group_commit_interval_ms" = "10000",
                          "group_commit_data_bytes" = "134217728"
                          ); |
    +-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    Rows returned: [1], Time elapsed: [22ms]
  4. Export the data to OSS.

    Run the `EXPORT` command to export the data to OSS.

    The following code shows the syntax.

    EXPORT TABLE table_name
    [PARTITION (p1[,p2])]
    [WHERE]
    TO export_path
    [opt_properties]
    WITH S3
    [s3_properties];

    The following table describes the parameters.

    Parameter

    Required

    Description

    table_name

    Yes

    The name of the table to export.

    partition

    No

    The partitions of the table to export.

    where

    No

    The table data that meets the specified conditions to export.

    export_path

    Yes

    The path to the exported file.

    The value can be a directory or a combination of a directory and a file prefix. For example, s3://path/to/my_file_.

    opt_properties

    No

    The properties used to specify export parameters. For more information, see opt_properties syntax and parameters.

    s3_properties

    Yes

    The properties used to configure S3 protocol attributes. For more information, see s3_properties syntax and parameters.

    The following code shows the syntax of `opt_properties`.

    [PROPERTIES ("key"="value", ...)]

    The following table describes the parameters.

    Parameter

    Required

    Description

    label

    No

    The label of the export job. If you do not specify this parameter, the system randomly generates a label.

    column_separator

    No

    The column delimiter for the exported data.

    The default value is \t. Multi-byte delimiters are supported. This parameter is valid only for the CSV file format.

    line_delimiter

    No

    The row delimiter for the exported data.

    The default value is \n. Multi-byte delimiters are supported. This parameter is valid only for the CSV file format.

    columns

    No

    The columns of the table to export.

    format

    No

    The file format for the export job.

    Supported formats: parquet, orc, csv, csv_with_names, and csv_with_names_and_types.

    The default format is csv.

    max_file_size

    No

    The maximum size of a single file for an export job. If the result exceeds this value, it is split into multiple files.

    • Value range: 5 MB to 2 GB

    • Default value: 1 GB

    When you specify the export file format as ORC (that is, the format_as parameter is ORC), the actual size of the split file is ceil (max_file_size/64) × 64 MB.

    parallelism

    No

    The concurrency of the export job. The default value is 1.

    The export job starts a number of threads equal to the parallelism value to execute select into outfile statements. If the parallelism value is greater than the number of tablets in the table, the system automatically sets the parallelism to the number of tablets. This means each select into outfile statement is responsible for one tablet.

    delete_existing_files

    No

    Specifies whether to delete all files in the destination path.

    false (default): Does not delete files in the destination path.

    true: Deletes all files in the directory specified by export_path, and then exports data to that directory. For example:

    • If "file_path" = "/user/tmp", all files and directories under "/user/" are deleted.

    • If "file_path" = "/user/tmp/", all files and directories under "/user/tmp/" are deleted.

    Warning
    • Setting delete_existing_files = true is a risky operation. Use it only in test environments.

    • To use the delete_existing_files parameter, submit a ticket to Alibaba Cloud support. The SelectDB technical team adds the enable_delete_existing_files = true configuration to the fe.conf file and restarts the FE. The delete_existing_files parameter takes effect only after the restart.

    with_bom

    No

    The default value is false. If set to true, the exported file is encoded in UTF-8 with a bill of materials (BOM). This is effective only for CSV-related file formats.

    timeout

    No

    The timeout period for the export job. The default value is 2 hours. The unit is seconds.

    The following code shows the syntax of `s3_properties`.

    ("key"="value"[,...])

    The following table describes the parameters.

    Parameter

    Required

    Description

    s3.endpoint

    Yes

    The endpoint of the destination that uses the S3 protocol.

    This topic uses Alibaba Cloud OSS for migration. This parameter is the Endpoint used to access OSS data. For more information about how to obtain the endpoint, see OSS regions and endpoints.

    Important

    Make sure that the OSS bucket and your ApsaraDB for SelectDB instance are in the same region.

    s3.access_key

    Yes

    The user identity key for the destination that uses the S3 protocol.

    This topic uses Alibaba Cloud OSS for migration. This parameter is the AccessKey ID of the RAM user used to access OSS.

    s3.secret_key

    Yes

    The user authentication string for the destination that uses the S3 protocol.

    This topic uses Alibaba Cloud OSS for migration. This parameter is the AccessKey secret of the RAM user used to access OSS.

    s3.region

    Yes

    The region of the destination that uses the S3 protocol.

    This topic uses Alibaba Cloud OSS for migration. This parameter is the region of the Alibaba Cloud OSS bucket. For more information about how to obtain the region, see OSS regions and endpoints.

    use_path_style

    No

    The default value is false.

    The S3 SDK uses the virtual-hosted style by default.

    Some object storage systems may not have virtual-hosted style access enabled or supported. You can add the use_path_style parameter to force the use of the path style.

    Note

    Three URI schemas are currently supported: http://, https://, and s3://.

    1. If you use http:// or https://, the system determines whether to use the path style to access the S3 protocol destination based on the use_path_style parameter.

    2. If you use s3://, the system uses the virtual-hosted style to access the S3 protocol destination.

    The following code provides an example.

    EXPORT TABLE  source_tb  -- Export data from the source_tb table.
                TO "s3://test-selectdb/test/"  -- Export data to the test folder in the OSS bucket named test-selectdb.
                PROPERTIES (
                    "label" = "test_export", -- The job label.
                    "format" = "orc",   -- The file storage format is ORC.
                    "max_file_size" = '2048MB', -- The file split size.
                    'parallelism' = '10'  -- The export concurrency is 10.
                ) WITH s3 (
                    "s3.endpoint" = "oss-cn-hangzhou-internal.aliyuncs.com",  -- The URL for accessing the OSS bucket.
                    "s3.region" = "oss-cn-hangzhou",  -- The region of the OSS bucket.
                    "s3.secret_key"="yourAccessKeySecret",  -- The AccessKey secret of the RAM user for accessing OSS.
                    "s3.access_key" = "LTAI****************"  -- The AccessKey ID of the RAM user for accessing OSS.
                );
  5. Verify that the data is exported.

    You can use one of the following two methods to check the data export status.

    • Check the status by running a command.

      In a client that is connected to Doris, run the following statement to check the export status.

      SHOW export;
      • Export successful: If the `State` in the result is `FINISHED`, the data export is successful.

      • Export failed: If the `State` in the result is `CANCELLED`, the data export failed. You can check the `ErrorMsg` field to troubleshoot the failure.

    • Check the status in the OSS console.

      Log on to the OSS console and check whether the file is generated in the specified export path.

Step 2: Import data to the destination table

  1. Log on to the destination cluster. For more information, see Connect to an ApsaraDB for SelectDB instance using DMS.

  2. Create the destination table.

  3. Create a database.

    1. Optional: Create a database.

      If a destination database already exists, you can skip this step.

      Run the following statement to create a database.

      CREATE DATABASE aim_db;
    2. Create a table.

      Run the backed-up `CREATE TABLE` statement from the source table.

      The following code shows the `CREATE TABLE` statement used in this topic.

      CREATE TABLE IF NOT EXISTS aim_tb (
          `c1` int(11) NULL,
          `c2` string NULL,
          `c3` bigint NULL
      ) ENGINE=OLAP
      DUPLICATE KEY(`c1`)
      DISTRIBUTED BY HASH(c1) BUCKETS 20
      PROPERTIES("replication_num" = "1");
  4. Import the data.

    Use `S3 load` to import data from OSS to SelectDB. For more information about the S3 load syntax and additional examples, see OSS Load.

    The following code provides an example.

     LOAD LABEL label_1  -- labelname: a random, unique identifier for the job
                    (
                        DATA INFILE("s3://test-selectdb/test/59ab2e9dc4ec4c04-9e50e45a6fda2c8e_0.orc")  -- The path of the exported data file in OSS from the previous step.
                        INTO TABLE aim_tb  -- The name of the table to import data into.
                        FORMAT AS ORC   -- The format of the imported file, which is the same as the format of the exported data.
                    )
                    WITH S3
                    (
                        "AWS_PROVIDER" = "OSS",
                        "AWS_REGION" = "oss-cn-hangzhou",  -- The region of the OSS bucket.
                        "AWS_ENDPOINT" = "oss-cn-hangzhou-internal.aliyuncs.com",  -- The URL for accessing the OSS bucket.
                        "AWS_ACCESS_KEY" = "LTAI****************",  -- The AccessKey ID of the RAM user for accessing OSS.
                        "AWS_SECRET_KEY"="yourAccessKeySecret"  -- The AccessKey secret of the RAM user for accessing OSS.
                    )
                    PROPERTIES
                    (
                        "timeout" = "3600"   -- The import timeout period.
                    );
  5. Verify the import.

    You can use one of the following two methods to check the import result.

    • Check the status by running a command.

      In a client that is connected to SelectDB, run the following statement to check the import status.

      SHOW load;

      Import successful: If the `State` in the result is `FINISHED`, the data import is successful.

    • Check the data in the destination table. The following code provides an example statement.

      SELECT * FROM `aim_tb` limit 10;

      The following result is returned.

      +--------------+--------------+--------------+
      | c1           | c2           | c3           |
      +--------------+--------------+--------------+
      | 1            | doris        | 18           |
      | 3            | pipelibe     | 99999        |
      | 5            |              |              |
      | 2            | nereids      | 20           |
      | 4            | Apache       | 122123455    |
      +--------------+--------------+--------------+

      The data is consistent with the source data from Step 1: Export source data. The import is successful.

Use a catalog to migrate data

Prerequisites

  • Ensure that the Doris instance and the SelectDB instance can communicate with each other.

  • The version of the SelectDB instance cannot be earlier than the version of the Doris instance.

    Note

    SelectDB is a cloud-native real-time data warehouse built on Doris. For more information about the version relationship between them, see Kernel release notes.

  • Familiarize yourself with catalogs and their basic operations. For more information, see Data lakehouse.

Sample environment

This topic provides an example of how to migrate data from the `doris_t` table in the `doris_db` database of a Doris instance to the `test_doris2SelectDB` table in the `test_db` database of a SelectDB instance. When you perform the migration, modify the parameters based on your requirements. The following list describes the sample environment:

  • Destination database: `test_db`

  • Destination table: `test_doris2SelectDB`

  • Source database: `doris_db`

  • Source data table: `doris_t`

Source data preparation example

Log on to your source Doris database and run the following commands.

  1. Create a database.

    CREATE DATABASE doris_db;
  2. Create a table.

    CREATE TABLE doris_t
    (
        id int,
        name string,
        age int
    )
    DISTRIBUTED BY HASH(id) BUCKETS 4
    PROPERTIES("replication_num" = "1");
  3. Insert data.

    INSERT INTO doris_t VALUES
    (1, 'Alice', 25),
    (2, 'Bob', 30),
    (3, 'Charlie', 35),
    (4, 'David', 40),
    (5, 'Eve', 45);

Procedure

  1. Connect to the SelectDB instance. For more information, see Connect to an ApsaraDB for SelectDB instance using a MySQL client.

  2. Create a Doris JDBC catalog.

    CREATE CATALOG doris_catalog PROPERTIES (
     "type"="jdbc",
     "user"="root",
     "password"="123456",
     "jdbc_url" = "jdbc:mysql://127.0.0.1:9030/doris_db",
     "driver_url" = "mysql-connector-java-8.0.25.jar",
     "driver_class" = "com.mysql.cj.jdbc.Driver"
    )

    Parameter

    Required

    Default value

    Description

    user

    Yes

    None

    The account of the Doris database.

    password

    Yes

    None

    The password of the Doris database account.

    jdbc_url

    Yes

    None

    The JDBC connection string. It must contain the connection address of the Doris database.

    Format: jdbc:mysql://<host>:<port>/<database>

    • host: The IP address of the Doris database.

    • port: The port number of the Doris database.

    • database: The name of the database to access.

    Example: jdbc:mysql://127.0.0.1:9030/doris_db

    driver_url

    Yes

    None

    The name of the JDBC driver JAR package.

    Note
    • We recommend that you use mysql-connector-java-8.0.25.jar.

    • If you want to use other JAR packages, submit a ticket for consultation.

    driver_class

    Yes

    None

    The class name of the JDBC driver.

    We recommend that you set it to com.mysql.cj.jdbc.Driver.

    lower_case_table_names

    Note

    In version 4.0, this parameter is renamed to lower_case_meta_names.

    No

    "false"

    Specifies whether to synchronize the database and table names of the external JDBC data source in lowercase.

    true: Maintains a mapping from lowercase names to the actual names in the remote system, which lets you query databases and tables that are not in lowercase. In this case, the names of databases, tables, and columns are all converted to lowercase.

    false: You cannot query databases and tables that are not in lowercase.

    Important
    • For SelectDB 3.0:

      • If the lower_case_table_names parameter of the FE is set to 1 or 2, the lower_case_table_names parameter of the catalog must be set to true.

      • If the lower_case_table_names parameter of the FE is set to 0, the catalog parameter can be true or false.

    • For SelectDB 4.0:

      • If the lower_case_table_names parameter of the FE is 0 or 2, the names of databases, tables, and columns are not converted.

      • If the lower_case_table_names parameter of the FE is 1, table names are converted to lowercase, but database and column names are not converted.

    only_specified_database

    No

    "false"

    Specifies whether to synchronize only the specified database.

    true: Synchronizes only the database of the data source specified in the JDBC URL.

    false: Synchronizes all databases in the JDBC URL.

    include_database_list

    No

    ""

    When only_specified_database=true, specifies multiple databases to synchronize, separated by commas (,). Database names are case-sensitive.

    exclude_database_list

    No

    ""

    When only_specified_database=true, specifies multiple databases not to synchronize, separated by commas (,). Database names are case-sensitive.

    meta_names_mapping

    No

    ""

    If the external data source has objects with names that differ only in case, such as DORIS and doris, an error is reported due to ambiguity when you query the catalog. In this case, configure the meta_names_mapping parameter to resolve the conflict.

    For more information, see Case sensitivity settings.

    Important

    This parameter is applicable only to SelectDB 4.0.

  3. View the catalog.

    SHOW CATALOGS; -- Check whether the CATALOG was created successfully.

    The following result is returned.

    +--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
    | CatalogId    | CatalogName  | Type     | IsCurrent | CreateTime              | LastUpdateTime      | Comment                |
    +--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
    | 436009309195 | doris_catalog | jdbc      |           | 2024-08-06 17:09:08.058 | 2024-07-19 18:04:37 |                        |
    |            0 | internal     | internal | yes       | UNRECORDED              | NULL                | Doris internal catalog |
    +--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
  4. Optional: Switch to the `doris_catalog` external catalog.

    You can view and access data in the `doris_catalog` external catalog in the same way as you access data in an internal catalog.

    Note

    Currently, ApsaraDB for SelectDB supports only read operations on data in external catalogs.

    SWITCH doris_catalog;
  5. Optional: Switch to the internal catalog.

    If you did not perform Step 4, you can skip this step.

    SWITCH internal;
  6. Optional: Create a database.

    If the destination database already exists, you can skip this step.

    CREATE database test_db;
  7. Switch to the destination database.

    USE test_db;
  8. Create a table.

    If a destination table already exists, check whether the column data types in the destination table correspond to the column data types in the Doris source table.

    If a destination table does not exist, make sure that the column data types in the destination table correspond to the column data types in the Doris source table when you create the table.

    For more information about data type mapping, see Type mapping.

    CREATE TABLE test_doris2SelectDB
    (
        id int,
        name string,
        age int
    )
    DISTRIBUTED BY HASH(id) BUCKETS 4
    PROPERTIES("replication_num" = "1");
  9. Migrate the data.

    INSERT INTO test_doris2SelectDB SELECT *  FROM doris_catalog.doris_db.doris_t;
  10. Check the data import status.

    SELECT *  FROM test_doris2SelectDB;

Notes on incremental data migration

In production environments, Doris data consists of offline data and incremental data. Because data is typically migrated from Doris to SelectDB for data warehousing and query acceleration, you can use one of the following two methods for incremental data migration:

  • When you write data to the Doris instance, write a copy of the data to the SelectDB instance in parallel.

  • Use periodic jobs to read partitioned data from the Doris instance and write the data to the SelectDB instance.