This topic describes how to use a catalog to migrate offline data from StarRocks to ApsaraDB for SelectDB.
Prerequisites
Ensure that the StarRocks instance and the SelectDB instance can communicate with each other.
All nodes of the data source cluster and the SelectDB instance are in the same VPC. If they are not in the same VPC, you must first resolve the network connectivity issue. For more information, see How do I resolve network connectivity issues between an ApsaraDB for SelectDB instance and a data source?
Add the IP address of the StarRocks instance to the whitelist of the SelectDB instance. For more information, see Configure a whitelist.
If the StarRocks instance has a whitelist, add the IP address CIDR block of the SelectDB instance to the whitelist of the StarRocks instance.
To obtain the IP address of the SelectDB instance in the VPC to which the SelectDB instance belongs, you can perform the operations provided in How do I view the IP addresses in the VPC to which my ApsaraDB SelectDB instance belongs?
To obtain the public IP address of the SelectDB instance, you can run the ping command to access the public endpoint of the SelectDB instance and obtain the IP address of the instance.
You understand what a catalog is and its basic operations. For more information, see Data lakehouse.
Example environment
This example shows how to migrate data from the `SR_t` table in the `starRocks_db` StarRocks database to the `test_SR2SelectDB` table in the `test_db` SelectDB database. You can modify the parameters as needed. The sample environment is as follows:
Destination database: test_db
Destination table: test_SR2SelectDB
Source database: starRocks_db
Source table: SR_t
Example: Prepare the source data
Log on to your StarRocks data source and perform the following steps.
Create a database.
CREATE DATABASE starRocks_db;Create a table.
CREATE TABLE SR_t ( id int, name string, age int ) DISTRIBUTED BY HASH(id) BUCKETS 4 PROPERTIES("replication_num" = "1");Insert data.
INSERT INTO SR_t VALUES (1, 'Alice', 25), (2, 'Bob', 30), (3, 'Charlie', 35), (4, 'David', 40), (5, 'Eve', 45);
Procedure
Connect to the SelectDB instance. For more information, see Connect to an instance.
NoteThe
SWITCHinstruction does not work when you log on using DMS. You must connect using a MySQL client.You can create a StarRocks Java Database Connectivity (JDBC) catalog. For more information, see JDBC data sources.
CREATE CATALOG starrocks_catalog PROPERTIES ( "type"="jdbc", "user"="root", "password"="123456", "jdbc_url" = "jdbc:mysql://127.0.0.1:3306/demo", "driver_url" = "mysql-connector-java-8.0.25.jar", "driver_class" = "com.mysql.cj.jdbc.Driver", "checksum" = "fdf55dcef04b09f2eaf42b75e61ccc9a" )Parameters
Parameter
Required
Default value
Description
user
Yes
None
The username for the StarRocks database.
password
Yes
None
The password for the StarRocks database.
jdbc_url
Yes
None
The JDBC connection string. It must contain the endpoint of the StarRocks database.
driver_url
Yes
None
The name of the JDBC driver JAR package.
NoteUse
mysql-connector-java-8.0.25.jar.To use a different JAR package, submit a ticket.
driver_class
Yes
None
The class name of the JDBC driver.
Set this parameter to
com.mysql.cj.jdbc.Driver.lower_case_table_names
(Renamed to `lower_case_meta_names` in version 4.0)
No
"false"
Specifies whether to synchronize the names of databases and tables from the external JDBC data source in lowercase.
true: You can query databases and tables with names that are not in lowercase by maintaining a mapping from lowercase names to the actual names in the remote system. In this case, the names of databases, tables, and columns are converted to lowercase.
false: You cannot query databases and tables with names that are not in lowercase.
ImportantFor SelectDB 3.0:
If the
lower_case_table_namesparameter of the frontend (FE) is set to1or2, you must set thelower_case_table_namesparameter of the Catalog totrue.If the
lower_case_table_namesparameter of the FE is set to0, the Catalog parameter can betrueorfalse.
For SelectDB 4.0:
If the
lower_case_table_namesparameter of the FE is0or2, the names of databases, tables, and columns are not converted.If the
lower_case_table_namesparameter of the FE is1, table names are converted to lowercase, but database and column names are not.
only_specified_database
No
"false"
Specifies whether to synchronize only the specified database.
true: Synchronizes only the database specified in the JDBC URL.
false: Synchronizes all databases in the JDBC URL.
include_database_list
No
""
When
only_specified_database=true, this parameter specifies multiple databases to synchronize. Separate database names with commas (,). Database names are case-sensitive.exclude_database_list
No
""
When
only_specified_database=true, this parameter specifies multiple databases to exclude from synchronization. Separate database names with commas (,). Database names are case-sensitive.meta_names_mapping
No
""
If an external data source has names that differ only in case, such as `DORIS` and `doris`, querying the Catalog may cause an error due to ambiguity. In this case, configure the
meta_names_mappingparameter to resolve the conflict.For more information, see Case sensitivity settings.
ImportantThis parameter applies only to SelectDB 4.0.
You can view the Catalogs.
SHOW CATALOGS; -- Check if the Catalog was created successfully.The result is as follows.
+--------------+-------------------+----------+-----------+-------------------------+---------------------+------------------------+ | CatalogId | CatalogName | Type | IsCurrent | CreateTime | LastUpdateTime | Comment | +--------------+-------------------+----------+-----------+-------------------------+---------------------+------------------------+ | 436009309195 | starrocks_catalog | jdbc | | 2024-08-06 17:09:08.058 | 2024-07-19 18:04:37 | | | 0 | internal | internal | yes | UNRECORDED | NULL | Doris internal catalog | +--------------+-------------------+----------+-----------+-------------------------+---------------------+------------------------+(Optional) Switch to the External Catalog folder SR_catalog.
You can view and access data in the `starrocks_catalog` external catalog in the same way as you access data in an internal catalog.
NoteCurrently, ApsaraDB for SelectDB supports only read operations on data in external catalogs.
SWITCH SR_catalog;(Optional) Switch to the `internal` internal catalog.
If you did not perform the previous step, skip this step.
SWITCH internal;(Optional) Create a database.
If you have already created the destination database, skip this step.
CREATE database test_db;Switch to the destination database.
USE test_db;Create a table.
If a destination table already exists, verify that the data types of the target columns correspond to the data types of the source columns in StarRocks.
If a destination table does not exist, ensure that the data types of the target columns correspond to the data types of the source columns in StarRocks when you create the table.
For more information about column mapping, see Type mapping.
CREATE TABLE test_SR2SelectDB ( id int, name string, age int ) DISTRIBUTED BY HASH(id) BUCKETS 4 PROPERTIES("replication_num" = "1");Migrate the data.
INSERT INTO test_SR2SelectDB SELECT * FROM doris_catalog.SR_db.SR_t;Verify the imported data.
SELECT * FROM test_SR2SelectDB;
Migrate incremental data
In a production environment, StarRocks data includes both offline data and incremental data. A common scenario for migrating data from StarRocks to SelectDB is to copy data to the data warehouse for query acceleration. For incremental data migration, you can use one of the following two methods:
When you generate data for SelectDB, you can concurrently write a copy to SelectDB.
Use a periodic job to read partitioned data from StarRocks and write the data to SelectDB.