All Products
Search
Document Center

ApsaraDB for SelectDB:Migrate data from a self-managed Doris database to ApsaraDB for SelectDB

Last Updated:Feb 28, 2026

This topic describes how to migrate offline data from a self-managed Apache Doris database to an ApsaraDB for SelectDB instance. Two methods are available: export and import through Object Storage Service (OSS), and JDBC catalog.

Method comparison

Both methods require the Doris database and the SelectDB instance to be in the same region to use internal network traffic and avoid data transfer fees.

CriteriaExport and import through OSSCatalog
Best forLarge data volumesSmall data volumes
ScenarioData stored on the Alibaba Cloud platformData stored on the Alibaba Cloud platform, including Alibaba Cloud EMR clusters
External componentsRequires an OSS bucketNone
AdvantageSupports large-volume data migrationSimple setup with no external dependencies

Migrate data through OSS

This method exports data from the source Doris database to an OSS bucket, then imports the data from OSS into ApsaraDB for SelectDB. The process involves five steps:

  1. Activate OSS and create a bucket in the same region as the SelectDB instance.

  2. Back up the source table schema.

  3. Export source data to OSS.

  4. Create the destination table in SelectDB.

  5. Import data from OSS into SelectDB.

Prerequisites

Important

The OSS bucket and the ApsaraDB for SelectDB instance must be in the same region.

In the examples in this topic, the bucket is named test-selectdb.

Step 1: Export source data

Run on the self-managed Doris database.

  1. Log on to the source cluster. For more information about how to connect to a self-managed Doris database, see Connecting by MySQL Protocol - Apache Doris.

  2. (Optional) Prepare sample data. Skip this step if you already have data to migrate. Expected result:

    1. Verify the data.

       CREATE DATABASE source_db;
       CREATE TABLE IF NOT EXISTS source_tb (
           `c1` int(11) NULL,
           `c2` string NULL,
           `c3` bigint NULL
       )
       DISTRIBUTED BY HASH(c1) BUCKETS 20
       PROPERTIES("replication_num" = "1");
       INSERT INTO source_tb VALUES
           (1, 'doris', 18),
           (2, 'nereids', 20),
           (3, 'pipelibe', 99999),
           (4, 'Apache', 122123455),
           (5, null, null);
       SELECT * FROM `source_tb` limit 10;
       +--------------+--------------+--------------+
       | c1           | c2           | c3           |
       +--------------+--------------+--------------+
       | 1            | doris        | 18           |
       | 3            | pipelibe     | 99999        |
       | 5            |              |              |
       | 2            | nereids      | 20           |
       | 4            | Apache       | 122123455    |
       +--------------+--------------+--------------+
  3. Back up the CREATE TABLE statement for the source table. Run the SHOW CREATE TABLE statement to view and save the table schema. Example: Expected result:

       SHOW CREATE TABLE source_tb;
       +-----------------+----------------------------------------------------+
       | Table           | Create Table                                       |
       +-----------------+----------------------------------------------------+
       | source_tb       | CREATE TABLE `source_tb` (                         |
       |                 |   `c1` int NULL,                                   |
       |                 |   `c2` text NULL,                                  |
       |                 |   `c3` bigint NULL                                 |
       |                 | ) ENGINE=OLAP                                      |
       |                 | DUPLICATE KEY(`c1`)                                |
       |                 | DISTRIBUTED BY HASH(`c1`) BUCKETS 20               |
       |                 | PROPERTIES (                                       |
       |                 |   "file_cache_ttl_seconds" = "0",                  |
       |                 |   "is_being_synced" = "false",                     |
       |                 |   "storage_medium" = "hdd",                        |
       |                 |   "storage_format" = "V2",                         |
       |                 |   "inverted_index_storage_format" = "V2",          |
       |                 |   "light_schema_change" = "true",                  |
       |                 |   "disable_auto_compaction" = "false",             |
       |                 |   "enable_single_replica_compaction" = "false",    |
       |                 |   "group_commit_interval_ms" = "10000",            |
       |                 |   "group_commit_data_bytes" = "134217728"          |
       |                 | );                                                 |
       +-----------------+----------------------------------------------------+
  4. Export data to OSS. Run the EXPORT command to export data to OSS. Syntax: opt_properties parameters: Syntax: [PROPERTIES ("key"="value", ...)] s3_properties parameters: Syntax: ("key"="value"[,...]) Example:

    Main parameters:

    ParameterRequiredDescription
    table_nameYesThe name of the table to export.
    partitionNoThe partitions to export.
    whereNoFilter conditions for the data to export.
    export_pathYesThe destination file path. Can be a directory or a directory with a file prefix. Example: s3://path/to/my_file_.
    opt_propertiesNoExport parameters. See the opt_properties table below.
    s3_propertiesYesS3 protocol configuration. See the s3_properties table below.
    ParameterRequiredDefaultDescription
    labelNoRandomly generatedThe label of the export job.
    column_separatorNo\tColumn delimiter. Supports multi-byte delimiters. Valid only for CSV format.
    line_delimiterNo\nRow delimiter. Supports multi-byte delimiters. Valid only for CSV format.
    columnsNoAll columnsThe columns to export.
    formatNocsvFile format. Supported values: parquet, orc, csv, csv_with_names, csv_with_names_and_types.
    max_file_sizeNo1 GBMaximum size of a single output file. Range: 5 MB to 2 GB. For ORC format, the actual split size is ceil(max_file_size/64) x 64 MB.
    parallelismNo1Number of concurrent threads. Each thread runs a SELECT INTO OUTFILE statement. If the value exceeds the tablet count, the system caps it at the tablet count.
    delete_existing_filesNofalseWhether to delete all files in the destination path before export.
    Warning

    This is a destructive operation. Use only in test environments. To enable this parameter, submit a ticket to Alibaba Cloud support. The SelectDB technical team adds enable_delete_existing_files = true to fe.conf and restarts FE.

    with_bomNofalseWhether to add a byte order mark (BOM) to the exported file. Applies to UTF-8 encoding. Valid only for CSV-related formats.
    timeoutNo2 hoursExport timeout in seconds.
    ParameterRequiredDescription
    s3.endpointYesThe endpoint for the OSS bucket. The OSS bucket and the SelectDB instance must be in the same region. For more information, see OSS regions and endpoints.
    s3.access_keyYesThe AccessKey ID of the RAM user for accessing OSS.
    s3.secret_keyYesThe AccessKey secret of the RAM user for accessing OSS.
    s3.regionYesThe region of the OSS bucket. For more information, see OSS regions and endpoints.
    use_path_styleNoDefault: false. The S3 SDK uses virtual-hosted style by default. Set to true to force path style access. Three URI schemas are supported: http://, https://, and s3://. For http:// or https://, the system determines the access style based on the use_path_style parameter. For s3://, the system always uses virtual-hosted style.
       EXPORT TABLE table_name
       [PARTITION (p1[,p2])]
       [WHERE]
       TO export_path
       [opt_properties]
       WITH S3
       [s3_properties];
       EXPORT TABLE  source_tb  -- Export data from the source_tb table.
                   TO "s3://test-selectdb/test/"  -- Export data to the test folder in the OSS bucket named test-selectdb.
                   PROPERTIES (
                       "label" = "test_export", -- The job label.
                       "format" = "orc",   -- The file storage format is ORC.
                       "max_file_size" = '2048MB', -- The file split size.
                       'parallelism' = '10'  -- The export concurrency is 10.
                   ) WITH s3 (
                       "s3.endpoint" = "oss-cn-hangzhou-internal.aliyuncs.com",  -- The URL for accessing the OSS bucket.
                       "s3.region" = "oss-cn-hangzhou",  -- The region of the OSS bucket.
                       "s3.secret_key"="yourAccessKeySecret",  -- The AccessKey secret of the RAM user for accessing OSS.
                       "s3.access_key" = "LTAI****************"  -- The AccessKey ID of the RAM user for accessing OSS.
                   );
  5. Verify the export. Check the export status by using one of the following methods:

    • Run a command: In a client connected to the Doris database, run the following statement: ``sql SHOW export; ``

      • If the State column shows FINISHED, the export succeeded.

      • If the State column shows CANCELLED, the export failed. Check the ErrorMsg field to troubleshoot.

    • Check the OSS console: Log on to the OSS console and verify that files exist in the specified export path.

Step 2: Import data to the destination table

Run on ApsaraDB for SelectDB.

  1. Connect to the SelectDB instance. For more information, see Connect to an ApsaraDB for SelectDB instance using DMS.

  2. (Optional) Create a destination database. Skip this step if the destination database already exists.

       CREATE DATABASE aim_db;
  3. Create the destination table. Run the backed-up CREATE TABLE statement from the source table. The following example shows the statement used in this topic:

       CREATE TABLE IF NOT EXISTS aim_tb (
           `c1` int(11) NULL,
           `c2` string NULL,
           `c3` bigint NULL
       ) ENGINE=OLAP
       DUPLICATE KEY(`c1`)
       DISTRIBUTED BY HASH(c1) BUCKETS 20
       PROPERTIES("replication_num" = "1");
  4. Import the data. Use S3 Load to import data from OSS into SelectDB. For more information about the S3 Load syntax and additional examples, see OSS Load.

       LOAD LABEL label_1  -- labelname: a random, unique identifier for the job
                       (
                           DATA INFILE("s3://test-selectdb/test/59ab2e9dc4ec4c04-9e50e45a6fda2c8e_0.orc")  -- The path of the exported data file in OSS from the previous step.
                           INTO TABLE aim_tb  -- The name of the table to import data into.
                           FORMAT AS ORC   -- The format of the imported file, which is the same as the format of the exported data.
                       )
                       WITH S3
                       (
                           "AWS_PROVIDER" = "OSS",
                           "AWS_REGION" = "oss-cn-hangzhou",  -- The region of the OSS bucket.
                           "AWS_ENDPOINT" = "oss-cn-hangzhou-internal.aliyuncs.com",  -- The URL for accessing the OSS bucket.
                           "AWS_ACCESS_KEY" = "LTAI****************",  -- The AccessKey ID of the RAM user for accessing OSS.
                           "AWS_SECRET_KEY"="yourAccessKeySecret"  -- The AccessKey secret of the RAM user for accessing OSS.
                       )
                       PROPERTIES
                       (
                           "timeout" = "3600"   -- The import timeout period.
                       );
  5. Verify the import. Check the import result by using one of the following methods:

    • Run a command: In a client connected to SelectDB, run the following statement: ``sql SHOW load; ` If the State column shows FINISHED`, the import succeeded.

    • Query the destination table: Compare the destination data against the source data. ``sql SELECT * FROM aim_tb limit 10; ` Expected result: ` +--------------+--------------+--------------+ | c1 | c2 | c3 | +--------------+--------------+--------------+ | 1 | doris | 18 | | 3 | pipelibe | 99999 | | 5 | | | | 2 | nereids | 20 | | 4 | Apache | 122123455 | +--------------+--------------+--------------+ `` The data matches the source data from Step 1. The import is successful.

Migrate data by using a catalog

This method uses a JDBC catalog to read data from the source Doris database and write it to ApsaraDB for SelectDB. The process involves four steps:

  1. Configure network connectivity between the Doris instance and the SelectDB instance.

  2. Create a JDBC catalog in SelectDB that points to the source Doris database.

  3. Create the destination table in SelectDB.

  4. Run an INSERT INTO ... SELECT statement to migrate the data.

Prerequisites

Note

SelectDB is a cloud-native real-time data warehouse built on Apache Doris. For more information about version mapping, see Kernel release notes.

  • Catalog knowledge:

    • Familiarize yourself with catalogs and their basic operations. For more information, see Data lakehouse.

Sample environment

This topic uses the following environment. Replace the values with your own when performing the migration.

RoleDatabaseTable
Source (Doris)doris_dbdoris_t
Destination (SelectDB)test_dbtest_doris2SelectDB

(Optional) Prepare source data

Run on the self-managed Doris database.

Log on to the source Doris database and run the following commands.

  1. Create a database.

       CREATE DATABASE doris_db;
  2. Create a table.

       CREATE TABLE doris_t
       (
           id int,
           name string,
           age int
       )
       DISTRIBUTED BY HASH(id) BUCKETS 4
       PROPERTIES("replication_num" = "1");
  3. Insert data.

       INSERT INTO doris_t VALUES
       (1, 'Alice', 25),
       (2, 'Bob', 30),
       (3, 'Charlie', 35),
       (4, 'David', 40),
       (5, 'Eve', 45);

Procedure

Run on ApsaraDB for SelectDB.

  1. Connect to the SelectDB instance. For more information, see Connect to an ApsaraDB for SelectDB instance using a MySQL client.

  2. Create a Doris JDBC catalog.

    JDBC catalog parameters:

    ParameterRequiredDefaultDescription
    userYesNoneThe account of the Doris database.
    passwordYesNoneThe password of the Doris database account.
    jdbc_urlYesNoneThe JDBC connection string. Format: jdbc:mysql://<host>:<port>/<database>. host: the IP address of the Doris database. port: the port number. database: the name of the database. Example: jdbc:mysql://127.0.0.1:9030/doris_db.
    driver_urlYesNoneThe name of the JDBC driver JAR package. We recommend mysql-connector-java-8.0.25.jar. To use other JAR packages, submit a ticket for consultation.
    driver_classYesNoneThe class name of the JDBC driver. We recommend com.mysql.cj.jdbc.Driver.
    lower_case_table_namesNo"false"Whether to synchronize database and table names in lowercase. true: converts names to lowercase and maintains a mapping to the actual remote names. false: does not query databases and tables that are not in lowercase. For SelectDB 3.0: If the FE lower_case_table_names parameter is 1 or 2, set this catalog parameter to true. If the FE parameter is 0, either true or false is acceptable. For SelectDB 4.0: this parameter is renamed to lower_case_meta_names. If the FE parameter is 0 or 2, names are not converted. If the FE parameter is 1, only table names are converted to lowercase.
    only_specified_databaseNo"false"Whether to synchronize only the database specified in the JDBC URL. true: synchronizes only the specified database. false: synchronizes all databases.
    include_database_listNo""When only_specified_database=true, specifies additional databases to synchronize, separated by commas. Case-sensitive.
    exclude_database_listNo""When only_specified_database=true, specifies databases to exclude from synchronization, separated by commas. Case-sensitive.
    meta_names_mappingNo""Resolves conflicts when external data source objects differ only in case (for example, DORIS and doris). For more information, see Case sensitivity settings. Applicable to SelectDB 4.0 only.
       CREATE CATALOG doris_catalog PROPERTIES (
        "type"="jdbc",
        "user"="root",
        "password"="123456",
        "jdbc_url" = "jdbc:mysql://127.0.0.1:9030/doris_db",
        "driver_url" = "mysql-connector-java-8.0.25.jar",
        "driver_class" = "com.mysql.cj.jdbc.Driver"
       )
  3. Verify the catalog. Expected result:

       SHOW CATALOGS; -- Check whether the catalog was created successfully.
       +--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
       | CatalogId    | CatalogName  | Type     | IsCurrent | CreateTime              | LastUpdateTime      | Comment                |
       +--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
       | 436009309195 | doris_catalog | jdbc      |           | 2024-08-06 17:09:08.058 | 2024-07-19 18:04:37 |                        |
       |            0 | internal     | internal | yes       | UNRECORDED              | NULL                | Doris internal catalog |
       +--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
  4. (Optional) Switch to the doris_catalog external catalog. You can view and access data in the external catalog the same way as the internal catalog.

    Note

    ApsaraDB for SelectDB supports only read operations on data in external catalogs.

       SWITCH doris_catalog;
  5. (Optional) Switch back to the internal catalog. Skip this step if you did not perform step 4.

       SWITCH internal;
  6. (Optional) Create the destination database. Skip this step if the destination database already exists.

       CREATE database test_db;
  7. Switch to the destination database.

       USE test_db;
  8. Create the destination table. Make sure the column data types in the destination table match the source Doris table. For more information about data type mapping, see Type mapping.

       CREATE TABLE test_doris2SelectDB
       (
           id int,
           name string,
           age int
       )
       DISTRIBUTED BY HASH(id) BUCKETS 4
       PROPERTIES("replication_num" = "1");
  9. Migrate the data.

       INSERT INTO test_doris2SelectDB SELECT *  FROM doris_catalog.doris_db.doris_t;
  10. Verify the migration. Expected result:

        SELECT *  FROM test_doris2SelectDB;
        +------+---------+------+
        | id   | name    | age  |
        +------+---------+------+
        |    1 | Alice   |   25 |
        |    2 | Bob     |   30 |
        |    3 | Charlie |   35 |
        |    4 | David   |   40 |
        |    5 | Eve     |   45 |
        +------+---------+------+

Incremental data migration

In production environments, Doris data consists of offline data and incremental data. Because data is typically migrated from Doris to SelectDB for data warehousing and query acceleration, use one of the following methods for incremental data migration:

  • Dual write: When writing data to the Doris instance, write a copy of the data to the SelectDB instance in parallel.

  • Periodic sync: Use periodic jobs to read partitioned data from the Doris instance and write the data to the SelectDB instance.