All Products
Search
Document Center

PolarDB:Optimize data import and export

Last Updated:Dec 08, 2023

Data is frequently imported to or exported from databases. This topic describes how to import data to and export data from PolarDB-X databases.

Test environment

The following table describes the test environment that is used to perform tests in this topic.

Specification

Requirement

PolarDB-X version

polarx-kernel_5.4.11-16282307_xcluster-20210805

Node specifications

16 CPU cores, 64 GB memory

Number of nodes

4

Run the following command to create a test table:

CREATE TABLE `sbtest1` (
    `id` int(11) NOT NULL,
    `k` int(11) NOT NULL DEFAULT '0',
    `c` char(120) NOT NULL DEFAULT '',
    `pad` char(60) NOT NULL DEFAULT '',
    PRIMARY KEY (`id`),
    KEY `k_1` (`k`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4 dbpartition by hash(`id`);

Data import and export methods

The following methods are used to export data from PolarDB-X databases:

  • Run the mysql -e command to export data.

  • Use the musqldump tool to export data.

  • Execute the SELECT INTO OUTFILE statement to export data. By default, this method is disabled.

  • Use Batch Tool to export data. Batch Tool is automatically enabled after you purchase a PolarDB-X instance.

The following methods are used to import data to PolarDB-X databases:

  • Execute the SOURCE statement to import data.

  • Run the mysql command to import data.

  • Use a program to export data.

  • Execute the LOAD DATA statement to import data.

  • Use Batch Tool to import data. Batch Tool is automatically enabled after you purchase a PolarDB-X instance.

Use MySQL native commands to export or import data

Run the mysql -e command to connect to a local or remote server and then execute SQL statements. For example, you can execute the SELECT statement to query data. The values in the raw data are separated with tab characters. You can use string manipulation functions to change the separator to commas (,) and save the file in the .csv format. The following code provides an example:

mysql -h ip  -P port -u usr -pPassword db_name -N -e "SELECT id,k,c,pad FROM sbtest1;" >/home/data_1000w.txt
## The values in the raw data are separated with tab characters. For example, the returned result of this command is 188092293    27267211    59775766593-64673028018-...-09474402685    01705051424-...-54211554755.

mysql -h ip  -P port -u usr -pPassword db_name -N -e "SELECT id,k,c,pad FROM sbtest1;" | sed 's/\t/,/g' >/home/data_1000w.csv
## After you execute this command, the values in the output are separated with commas (,). For example, the returned result is 188092293,27267211,59775766593-64673028018-...-09474402685,01705051424-...-54211554755.

To import data of the original format, execute the LOAD DATA statement. For more information, see LOAD DATA . The following code provides an example:

LOAD DATA LOCAL INFILE '/home/data_1000w.txt' INTO TABLE sbtest1;
## LOCAL specifies that data is imported from a local file. The value of the local_infile parameter must be set to on.

To import a file in the .csv format, use a program. For more information, see Use a program to import data.

Use the mysqldump tool to export or import data

You can use the mysqldump tool to connect to a local or remote server. For more information, see Use mysqldump to migrate data.

  • Run the following command to export data by using the mysqldump tool:

    mysqldump -h ip  -P port -u usr -pPassword --default-character-set=utf8mb4 --net_buffer_length=10240 --no-tablespaces --no-create-db --no-create-info --skip-add-locks --skip-lock-tables --skip-tz-utc --set-charset  --hex-blob db_name [table_name] > /home/dump_1000w.sql
                        

    The following list provides the errors that can occur when you use the mysqldump tool to export data and the operations that you can perform to resolve the issues. In most cases, the errors occur because the version of the MySQL client and the version of the MySQL server do not match.

    1. Error 1: mysqldump: Couldn't execute 'SHOW VARIABLES LIKE 'gtid\_mode''

      Solution: Add the "--set-gtid-purged=OFF" parameter setting to disable gtid_mode.

    2. Error 2: mysqldump: Couldn't execute 'SHOW VARIABLES LIKE 'ndbinfo\_version''

      Solution: Make sure that the version of mysqldump and the version of the MySQL server match with each other.

    The exported data is displayed as an SQL statement. The result includes the Batch Insert statement and other SQL statements. For example, the following statement is included: INSERT INTO `sbtest1` VALUES (...),(...). The net_buffer_length parameter affects the batch size.

  • You can use one of the following methods to import data as an SQL statement:

    Method 1: Execute the SOURCE statement.
    source /home/dump_1000w.sql
    
    Method 2: Run the mysql command.
    mysql -h ip  -P port -u usr -pPassword --default-character-set=utf8mb4 db_name < /home/dump_1000w.sql

Use Batch Tool to export or import data

Batch Tool is a data import and export tool developed by Alibaba Cloud. This tool supports multithreading.

  • Export data

    ## Export files. The number of exported files is equal to the number of shards.
    java -jar batch-tool.jar -h ip  -P port -u usr -pPassword -D db_name -o export -t sbtest1 -s ,
    
    ## Export multiple files and merge the files into one file.
    java -jar batch-tool.jar -h ip  -P port -u usr -pPassword -D db_name -o export -t sbtest1 -s , -F 1
  • Import data

    ## Import 32 files.
    java -jar batch-tool.jar -hpxc-spryb387va****.polarx.singapore.rds.aliyuncs.com  -P3306 -uroot -pPassword -D sysbench_db -o import -t sbtest1 -s , -f "sbtest1_0;sbtest1_1;sbtest1_2;sbtest1_3;sbtest1_4;sbtest1_5;sbtest1_6;sbtest1_7;sbtest1_8;sbtest1_9;sbtest1_10;sbtest1_11;sbtest1_12;sbtest1_13;sbtest1_14;sbtest1_15;sbtest1_16;sbtest1_17;sbtest1_18;sbtest1_19;sbtest1_20;sbtest1_21;sbtest1_22;sbtest1_23;sbtest1_24;sbtest1_25;sbtest1_26;sbtest1_27;sbtest1_28;sbtest1_29;sbtest1_30;sbtest1_31" -np -pro 64 -con 32
    
    ## Import one file.
    java -jar batch-tool.jar -h ip  -P port -u usr -p password -D db_name -o import -t sbtest1 -s , -f "sbtest1_0" -np

Compare the methods that are used to export data

The following data is based on the test of exporting 10,000,000 rows from PolarDB-X. Approximately 2 GB of data is exported.

Method

Data format

File size

Duration

Number of rows that are exported per second

Volume of data that is exported per second (MB/s)

Run the mysql -e command to export raw data

Raw data format

1998MB

33.417s

299248

59.8

Run the mysql -e command to export data in the .csv format

CSV format

1998MB

34.126s

293031

58.5

Use the mysqldump tool to export data (net-buffer-length=10KB)

SQL statements

2064MB

30.223s

330873

68.3

Use the mysqldump tool to export data (net-buffer-length=200KB)

SQL statements

2059MB

32.783s

305036

62.8

Use Batch Tool to export data (number of files = number of shards = 32)

CSV format

1998MB

4.715s

2120890

423.7

Use Batch Tool to export 1 file

CSV format

1998MB

5.568s

1795977

358.8

Summary:

  1. The mysql -e command and the mysqldump tool support single-threaded operations and provide similar performance.

  2. Batch Tool support multi-thread operations. You can configure the number of concurrent threads. This improves the export performance.

Compare the methods that are used to import data

The following data is based on the test of importing 10,000,000 rows into PolarDB-X. Data of approximately 2 GB is imported.

Method

Data format

Duration

Imported rows per second

Amount of data imported per second (MB/s)

SOURCE statement (net-buffer-length=10KB)

SQL statement format

10m24s

16025

3.2

SOURCE statement (net-buffer-length=200KB)

SQL statement format

5m37s

29673

5.9

mysql command (net-buffer-length=10KB)

SQL statement format

10m27s

15948

3.2

mysql command (net-buffer-length=200KB)

SQL statement format

5m38s

29585

5.9

LOAD DATA statement

Raw data format

4m0s

41666

8.3

Use a program to import data (batch size = 1000, number of threads = 1)

CSV format

5m40s

29411

5.9

Use a program to import data (batch size = 1000, number of threads = 32)

CSV format

19s

526315

105.3

Use Batch Tool to import data (number of files = number of shards = 32)

CSV format

19.836s

504133

100.8

Use Batch Tool to import 1 file

CSV format

10.806s

925411

185.1

Summary:

  1. The SOURCE statement and mysql command support single-thread SQL statements. In these methods, the Batch Insert statement is used. The batch size affects the import performance. The batch size is related to the net-buffer-length that you configure when you use mysqldump to export data. We recommend that you perform the following operations to optimize the import process:

    • Set the value of the net-buffer-length parameter to a value that is less than or equal to 256 KB. This increases the batch size and improves the import performance.

    • Use third-party tools to back up data and import data in multiple threads. For example, you can use mydumper that is provided by mysqldump to back up data and use myloader that is provided by mysqldump to import data.

  2. The LOAD DATA statement is executed on a single thread and provides better performance than the mysql command and the SOURCE statement.

  3. If you use a program to import data, you can configure the batch size and the number of concurrent threads to improve the performance. We recommend that you set the batch size to 1000 and the concurrency to a value from 16 to 32.

  4. Batch Tool supports multi-threaded import operations. This tool is suitable for importing data from multiple shards of a distributed database and provides high performance.

Summary

  1. PolarDB-X is compatible with common data import and export methods that are used to perform O&M operations by using MySQL. Most of the methods are provided for standalone MySQL databases and support only single-threaded operations. These methods cannot fully use the distributed architecture of PolarDB-X databases.

  2. PolarDB-X provides Batch Tool. This tool is suitable for importing and exporting data from distributed databases. Batch Tool supports multi-threaded operations that provide high-performance data export and import capabilities.