All Products
Search
Document Center

AnalyticDB for MySQL:Import and export

Last Updated:Jun 08, 2023

This topic provides answers to some frequently asked questions about import and export in AnalyticDB for MySQL.

Note

If no edition is specified in a question, the question is applicable only to AnalyticDB for MySQL Data Warehouse Edition (V3.0) clusters.

FAQ overview

How do I stop an asynchronous import or export task?

Log on to the AnalyticDB for MySQL console. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID. In the left-side navigation pane, click Diagnostics and Optimization. On the Import/Export Tasks tab, find the asynchronous task that you want to stop, obtain the name of the Asynchronous Task Name, and then execute the CANCEL JOB "${asynchronous task name}" statement to cancel the asynchronous task. For more information about asynchronous import and export tasks, see Asynchronously submit an import task.

How do I query the import tasks that are in the running state in the current database by using a statement?

Execute the following statement:

SELECT * FROM INFORMATION_SCHEMA.kepler_meta_async_jobs where status = "RUNNING"

What do I do if performance is not as expected when I use a program and JDBC to import data to AnalyticDB for MySQL?

  • Make sure that the data source can produce data at a high speed. If the data comes from other systems or files, check the client for output bottlenecks.

  • Make sure that data can be processed at a high speed. Check whether data can be synchronously produced and consumed and make sure that sufficient amount of data is waiting to be imported to AnalyticDB for MySQL.

  • While ensuring appropriate workloads on the host where the client is located, check the CPU utilization or disk I/O usage to determine whether system resources are sufficient.

Am I charged for connections when I run APS tasks in AnalyticDB for MySQL Data Lakehouse Edition (V3.0)?

You are not charged for connections when you run APS tasks. However, APS tasks are run on resource groups and you are charged for the consumed resources.

Should I choose an internal endpoint or a public endpoint when I create an OSS external table to import or export data?

When you create an OSS external table to import or export data, you must choose an internal endpoint, because AnalyticDB for MySQL uses OSS external tables to access OSS over an internal network instead of the Internet.

What do I do if an "endpoint unreachable" error occurs when I create a MaxCompute external table?

Cause: When you create a MaxCompute external table, if the specified endpoint is inaccessible, an "endpoint unreachable" error occurs.

Solution: Enable Elastic Network Interface (ENI), replace the endpoint in the CREATE TABLE statement with the Virtual Private Cloud (VPC) endpoint of the region where your cluster resides, and then execute the CREATE TABLE statement again. For information about how to obtain the VPC endpoint, see the "Endpoints in different regions (VPC)" section of the Endpoints topic.

What do I do if an error message prompts me to roll back the write operation and submit a row of data each time when I import data from MaxCompute?

Cause: This error may occur when you use DataX to import data due to connection layer limits of AnalyticDB for MySQL.

Solution: Add rewriteBatchedStatements=false to the JDBC string and import data again.

What do I do if a Query Exceeded Maximum Time Limit error occurs when I import data from MaxCompute?

Cause: The amount of time it takes to import large amounts of MaxCompute table data exceeds the time limit of INSERT operations in AnalyticDB for MySQL.

Solution: Modify the value of the INSERT_SELECT_TIMEOUT parameter and import data again. For more information, see Common configuration parameters.

What do I do if a Query execution error occurs when I query MaxCompute data?

  • Cause 1: Permissions on MaxCompute are incorrectly configured. The provided AccessKey pair cannot be used to read data from MaxCompute tables.

Solution 1: Modify the read and write permissions on MaxCompute and query data again.

  • Cause 2: The schema and column names of the AnalyticDB for MySQL table are different from those of the MaxCompute table.

Solution 2: Create another AnalyticDB for MySQL external table that uses the same schema and column names as the MaxCompute table and query data again.

  • Cause 3: The specified MaxCompute partition does not exist.

Solution 3: Modify the MaxCompute partition that is specified in the query statement and query data again.

  • Cause 4: The MaxCompute table contains a large number of small files.

Solution 4: Merge small files of MaxCompute and query data again. For more information about how to merge small files of MaxCompute, see Merge small files. Sample statement:

ALTER TABLE tablename [PARTITION] MERGE SMALLFILES;

How do I import data of the ARRAY<STRING> type from MaxCompute to AnalyticDB for MySQL?

Cause: You cannot directly import data of the ARRAY<STRING> type from MaxCompute to AnalyticDB for MySQL because MaxCompute external tables do not support embedded data types.

Solution: Import data from MaxCompute to OSS in the Parquet format. Then, import the Parquet-format data from OSS to AnalyticDB for MySQL.

How do I speed up data import from MaxCompute?

  • If storage nodes support only a small workload, modify the value of the SQL_OUTPUT_BATCH_SIZE parameter and import data again. Sample statement:

    set adb_config SQL_OUTPUT_BATCH_SIZE = 6000;
  • If a large number of partitions exist in MaxCompute, change the value of the ENABLE_ODPS_MULTI_PARTITION_PART_MATCH parameter to false and import data again. Sample statement:

    set adb_config ENABLE_ODPS_MULTI_PARTITION_PART_MATCH=false;

If this issue persists, contact technical support.

Why is data not overwritten when I execute the INSERT OVERWRITE statement to export data from AnalyticDB for MySQL to a MaxCompute external table?

This issue occurs because MaxCompute external tables do not support data overwriting.

The amount of data imported from MaxCompute is inconsistent with that queried in AnalyticDB for MySQL. Why?

Cause: AnalyticDB for MySQL removes the data entries that contain duplicate primary key values.

Solution: Check whether the MaxCompute data entries contain duplicate primary key values.

Will an error occur if a source database contains data types that are not supported by AnalyticDB for MySQL when I use Data Transmission Service (DTS) to synchronize data to AnalyticDB for MySQL?

If a source database contains data whose types are not supported by AnalyticDB for MySQL (such as geographic location data), AnalyticDB for MySQL discards the columns of such data types during schema initialization.

For more information about data types that are supported by AnalyticDB for MySQL, see Basic data types and Complex data types.

Can I change data types in source tables when I use DTS to synchronize data to AnalyticDB for MySQL?

You can change data types in source tables during data synchronization. Data types can be changed only within integer types or floating-point types, and only from types with a narrow value range to those with a wide value range or from single-precision data types to double-precision data types.

  • Integer data types include TINYINT, SMALLINT, INT, and BIGINT. You can change data types with a narrow value range to those with a wide value range. For example, you can change the data type of a column from TINYINT to BIGINT, but not the other way around.

  • Floating-point data types include FLOAT and DOUBLE. You can change the data type of a column from FLOAT to DOUBLE, but not the other way around.

What do I do if an error occurs because the data type of a source table is changed when I use DTS to synchronize data to AnalyticDB for MySQL?

Error:

  • 'id' is LONG type, Can't change column type to DECIMAL
  • modify precision is not supported, col=id, type decimal, old=11, new=21

Cause: For information about the error cause, see the "Can I change data types in source tables when I use DTS to synchronize data to AnalyticDB for MySQL?" section of this topic.

Solution:

  • Non-full-database synchronization: We recommend that you resynchronize the source table. Specifically, remove the source table from the synchronization objects, delete the destination table from the destination database, and then add the source table to the synchronization objects. DTS reperforms full data synchronization including the schema. This way, DDL statements of the non-full-database synchronization are deleted.

  • Full-database synchronization: Create another table that uses a different name from the error table and the same schema as the source table in AnalyticDB for MySQL. Then, execute the INSERT INTO SELECT statement to write data of the source table to the new table, delete the error table, execute the RENAME statement to change the name of the new table to that of the error table, and then restart the data synchronization task.

What do I do if an error occurs because invalid date values are written when I use DTS to synchronize data to AnalyticDB for MySQL?

Error:

Cannot parse "2013-05-00 00:00:00": Value 0 for dayOfMonth must be in the range [1,31]]

Cause: AnalyticDB for MySQL does not allow you to write invalid date values.

Solution:

  • If the task is in the full data initialization stage, change values in the source table to valid values. For example, change 2013-05-00 00:00:00 to 2013-05-01 00:00:00.

  • If the task is in the incremental data synchronization stage, remove the source table from the synchronization objects, change values in the source table to valid values, add the source table to the synchronization objects, and then restart the task.

  • If you perform incremental data synchronization during the full database synchronization, contact technical support to enable invalid value writing. After invalid value writing is enabled, all invalid values are written to null.

What do I do if an error occurs because tables without primary keys are synchronized when I use DTS to synchronize data to AnalyticDB for MySQL?

Error:

DTS-077004: Record Replicator error. cause by [[17003,2020051513063717201600100703453067067] table not exist => xxx_table] 

Cause: AnalyticDB for MySQL does not allow you to synchronize tables without primary keys.

Solution: This error occurs only when the full database is synchronized. Check whether the source database contains a table without a primary key. If yes, manually create a table that has a primary key in the destination database and restart the DTS task.

What do I do if an error occurs because the default field value is excessively long in CREATE TABLE when I use DTS to synchronize data to AnalyticDB for MySQL?

Error:

default value is too long

Solution: Contact technical support to upgrade your AnalyticDB for MySQL cluster to the latest version.

What do I do if an error occurs because more than 16 MB of data is written in a single record when I use DTS to synchronize data to AnalyticDB for MySQL?

Error:

com.mysql.jdbc.PacketTooBigException: Packet for query is too large (120468711 > 33554432). You can change this value on the server by setting the max_allowed_packet' variable.

Solution: Contact technical support to upgrade your AnalyticDB for MySQL cluster to the latest version.

What do I do if an error occurs because the disk space is exhausted when I use DTS to synchronize data to AnalyticDB for MySQL?

Error:

disk is over flow

Solution: Delete part of data to release enough disk space, or contact technical support to scale out your AnalyticDB for MySQL cluster. Make sure the disk space is enough, and then restart the DTS task.

What do I do if an error occurs because a table or field does not exist when I use DTS to synchronize data to AnalyticDB for MySQL?

Error:

table not exist => t1

Solution: Check whether all DDL statements such as CREATE TABLE are selected when you configure the DTS task. If all DDL statements are not selected, select all DDL statements.

Will an error occur if the name of a database, table, or column in the source instance contains hyphens (-) when I use DTS to synchronize data to AnalyticDB for MySQL?

The names of databases, tables, and columns cannot contain hyphens (-) in AnalyticDB for MySQL. To ensure successful data synchronization, the system maps hyphens (-) to underscores (_).

If a data synchronization fails due to errors in database, table, or column names, such as a table name that contains spaces or Chinese characters, contact technical support.

Note

For more information about limits of AnalyticDB for MySQL, see Limits.

What do I do if data latency occurs on the AnalyticDB for MySQL cluster when I use DTS to synchronize data to AnalyticDB for MySQL?

  • By default, a DTS synchronization task uses the medium instance class. If the source database is overloaded with write requests, you must upgrade instance specifications. For more information, see Upgrade a DTS instance.

  • For tables without primary keys, DTS automatically configures primary keys during synchronization. This slows down the update of hot data rows. You can submit a ticket to AnalyticDB for MySQL technical support.

  • If the write performance of an AnalyticDB for MySQL cluster reaches the upper limit, you must upgrade the specifications of the AnalyticDB for MySQL cluster.

Why does the write TPS fail to meet my expectations when I use DataWorks to import data to AnalyticDB for MySQL?

When a small amount of data is imported from the client, the data can be processed by the database server in a timely manner. However, the CPU and disk I/O resources are underutilized. As a result, the write response time and write TPS may not meet your expectations. You can increase the values of Data Records Per Write and Expected Maximum Concurrency to increase the amount of imported data. The data import performance linearly increases as the amount of imported data increases.

Why does data skew occur on the imported table when I use DataWorks to import data to AnalyticDB for MySQL?

If data skew occurs on the imported table, specific cluster nodes are overloaded and import performance is reduced. In this case, the CPU and disk I/O resources are underutilized, and write response time is prolonged. On the Data Modeling Diagnostics tab of the Diagnostics and Optimization page, the data skew diagnostic results of the imported table are displayed. You can redesign the table schema and import data again. For more information, see Schema design.

How do I check whether workload bottlenecks exist on the client or its server when I use adb-import.sh to import on-premises data?

If bottlenecks exist on the client, the workloads on the database during stress testing cannot be maximized. You can use one of the following methods to check whether workload bottlenecks exist on the client or its server:

  • Log on to the AnalyticDB for MySQL console. In the left-side navigation pane of a cluster, click Monitoring Information or Diagnostics and Optimization.

  • Run the commands that are displayed in the following table.

Command

Description

top

Displays the CPU utilization.

free

Displays the memory usage.

vmstat 1 1000

Displays the overall system workloads.

dstat -all --disk-util or iostat 1 1000

Displays the usage or read bandwidth of the disks.

jstat -gc <pid> 1000

Displays details about the Java garbage collection (GC) process of the import tool. If GC is frequently performed, you can set the jvmopts parameter to a greater value such as -Xmx16G -Xms16G.

How do I import multiple tables by using a single script when I use adb-import.sh to import on-premises data?

If the row delimiter and column delimiter of the imported files are the same, you can modify the values of the tableName and dataPath parameters to import multiple tables by using a single script.

For example, the tableName and dataPath parameters can be set to the following values:

tableName=$1
dataPath=$2

Run the following commands to import the files:

# sh adb-import.sh table_name001 /path/table_001
# sh adb-import.sh table_name002 /path/table_002
# sh adb-import.sh table_name003 /path/table_003

How do I run the import program in the background when I use adb-import.sh to import on-premises data?

You can run the following command to run the import program in the background:

# nohup sh adb-import.sh &

After the import program is started, you can run the following command to check the logs. If error information is returned, troubleshoot based on the returned information.

# tail -f nohup.out

You can also run the following command to check whether the import process is being executed normally:

# ps -ef|grep import

How do I ignore the errors that are returned for the import program when I use adb-import.sh to import on-premises data?

Errors may occur due to the following reasons:

  • SQL statements fail to be executed.

    You can ignore the errors by setting the ignoreErrors parameter to true. The following information is included in the execution results: the files for which errors occurred, the start row number, and the failed statements. If the batchSize parameter is specified, the faulty row number is less than or equal to the start row number plus the batchSize value.

  • The number of columns in a file does not match the expected number.

    If the number of columns in a file does not match the expected number, the import tool immediately stops importing the current file and returns the error information. However, this error is not ignored because the error is caused by an invalid file, and you must manually check the validity of the file. Sample error information:

    [ERROR] 2021-03-22 00:46:40,444 [producer- /test2/data/lineitem.csv.split00.100-41] analyticdb.tool.ImportTool
    (ImportTool.java:591) -bad line found and stop import! 16, file = /test2/data/tpch100g/lineitem.csv.split00.100, rowCount = 7, current row = 3|123|179698|145|73200.15|0.06|0.00|R|F|1994-02-02|1994-01-04|1994-02-
    23|NONE|AIR|ongside of the furiously brave acco|

How do I narrow down the cause of an import error when I use adb-import.sh to import on-premises data?

You can use the following methods to narrow down the cause of an import error:

  • If the import fails, the AnalyticDB for MySQL import tool returns the error log and the detailed error cause. A maximum of 1,000 characters of SQL statements can be returned. If you need to extend the limit, you can run the following commands and set the failureSqlPrintLengthLimit parameter to a greater value such as 1500:

    printErrorSql=true
    failureSqlPrintLengthLimit=1500;
  • Thousands of SQL statements may be batch executed based on the value of the batchSize parameter, which makes it difficult to identify the faulty rows. To facilitate troubleshooting, you can set the batchSize parameter to a less value such as 10. Sample command:

    batchSize=10;
  • If the file is split and the segment that contains the faulty rows is identified, you can specify the dataPath parameter to import the segment that contains the faulty rows. Sample command:

    dataPath=/u01/this/is/the/directory/where/product_info/stores/file007;

How do I run the import program in the Windows operating system when I use adb-import.sh to import on-premises data?

Windows does not provide batch scripts. You can run the following command to call a JAR file:

usage: java -jar adb-import-tool.jar [-a <arg>] [-b <arg>] [-B <arg>] [-c <arg>] 
      [-D <arg>] [-d <arg>] [-E <arg>] [-f <arg>] [-h <arg>] [-I <arg>] 
      [-k <arg>] [-l <arg>] [-m <arg>] [-n <arg>] [-N <arg>] [-O <arg>]
      [-o <arg>] [-p <arg>] [-P <arg>] [-Q <arg>] [-s <arg>]  [-S <arg>]
      [-t <arg>] [-T <arg>] [-u <arg>] [-w <arg>][-x <arg>] [-y <arg>] [-z <arg>]

Parameter

Required

Description

-h,--ip <arg>

Yes

The endpoint of the AnalyticDB for MySQL cluster.

-u,--username <arg>

The database account of the AnalyticDB for MySQL cluster.

-p,--password <arg>

The password corresponding to the database account of the AnalyticDB for MySQL cluster.

-P,--port <arg>

The port number that is used to connect to the AnalyticDB for MySQL cluster.

-D,--databaseName <arg>

The database name of the AnalyticDB for MySQL cluster.

-f,--dataFile <arg>

The absolute path of the file or folder that you want to import. You can perform one of the following operations:

  • Import a single file or folder.

  • Import multiple files whose paths are separated by commas (,).

-t,--tableName <arg>

The name of the table to be imported.

-a,--createEmptyFinishFilePath <arg>

No

Specifies whether to generate a file to indicate that the import is complete. The default value of this parameter is an empty string, which indicates that no such file is generated. If you want to generate such a file, set this parameter to the file name. For example, if you set this parameter to -a file_a, a file named -a file_a is generated.

-b,--batchSize <arg>

The number of values that are batch written in the INSERT INTO tablename VALUES (..),(..) statement. Default value: 1.

Note

To strike a balance between import speed and system performance, we recommend that you set this parameter to a value from 1024 to 4096.

-B,--encryptPassword <arg>

Specifies whether to encrypt the password of the database account. Default value: false.

-c,--printRowCount <arg>

Specifies whether to display the actual number of rows in the destination table each time a file is imported. Default value: false.

-d,--skipHeader <arg>

Specifies whether to skip the table header. Default value: false.

-E,--escapeSlashAndSingleQuote <arg>

Specifies whether to escape the backslashes (\) and apostrophes (') in the columns. Default value: true.

Note

If you set this parameter to true, the performance of your client may be compromised when the strings are parsed. If you are sure that no backslashes (\) or apostrophes (') exist in the table, set this parameter to false.

-I,--ignoreErrors <arg>

Specifies whether to ignore the batches of data that fails to be imported. Default value: false.

-k,--skipLineNum <arg>

The number of skipped rows. The purpose of this parameter is similar to that of the IGNORE number {LINES | ROWS} parameter. The default value of this parameter is 0, which specifies that no rows are skipped.

-l,--delimiter <arg>

The column delimiter. By default, \\| is used as the column delimiter in AnalyticDB for MySQL. You can also use non-printable characters as column delimiters. If you want to use non-printable characters, you must specify them in hexadecimal notation. For example, \x07\x07 must be specified as hex0707.

-m,--maxConcurrentNumOfFilesToImport <arg>

The number of files that are concurrently read when dataFile is set to a folder. The default value of this parameter is Integer.MAX_VALUE, which specifies that all files are concurrently read.

-n,--nullAsQuotes <arg>

Specifies whether to replace || with ". The default value of this parameter is false, which specifies that || is replaced with null instead of ".

-N,--printErrorSql <arg>

Specifies whether to display the SQL statements that fail to be executed. Default value: true.

-O,--connectionPoolSize <arg>

The size of the connection pool in the AnalyticDB for MySQL database. Default value: 2.

-o,--encoding <arg>

The encoding standard of the imported file. Valid values: GBK and UTF-8. Default value: UTF-8.

-Q,--disableInsertOnlyPrintSql <arg>

Specifies whether to not execute the INSERT statement. If you set this parameter to true, the INSERT statement is only displayed and not executed. If you set this parameter to false, the INSERT statement is executed. Default value: false.

-s,--lineSeparator <arg>

The row delimiter. By default, \\n is used as the row delimiter in AnalyticDB for MySQL. You can also use non-printable characters as column delimiters. If you want to use non-printable characters, you must specify them in hexadecimal notation. For example, \x0d\x06\x08\x0a must be specified as hex0d06080a.

-S,--printErrorStackTrace <arg>

Specifies whether to display the stack information when errors occur and printErrorSql is set to true Default value: false.

-w,--windowSize <arg>

The size of the buffer pool when the INSERT statement is executed. When data is sent to AnalyticDB for MySQL, this ensures that I/O and computing are separated and all procedures are accelerated to improve the client performance. Default value: 128.

-x,--insertWithColumnNames <arg>

Specifies whether to append column names when the INSERT INTO statement is executed. For example, data can be imported by executing the INSERT INTO tb(column1, column2) statement. Default value: true.

-y,--failureSqlPrintLengthLimit <arg>

The maximum length of the SQL statements that are displayed when the INSERT statement fails to be executed. Default value: 1000.

-z,--connectionUrlParam <arg>

The configurations of the database connection parameters. Default value: ?characterEncoding=utf-8.

Example: ?characterEncoding=utf-8&autoReconnect=true.

Examples:

  • Example 1: Use the default parameter configurations to import a single file.

    java -Xmx8G -Xms8G -jar adb-import-tool.jar -hyourhost.ads.aliyuncs.com -uadbuser -ppassword -P3306 -Dtest --dataFile /data/lineitem.sample --tableName LINEITEM
  • Example 2: Modify parameter values to import all files of a folder with the maximum throughput.

    java -Xmx16G -Xms16G -jar adb-import-tool.jar -hyourhost.ads.aliyuncs.com -uadbuser -ppassword -P3306 -Dtest --dataFile /data/tpch100g --tableName LINEITEM --concurrency 64 --batchSize 2048