This topic describes how to migrate data from Amazon Redshift to MaxCompute over the Internet.
Prerequisites
- Create an Amazon Redshift cluster and prepare data for migration.
For more information about how to create an Amazon Redshift cluster, see Amazon Redshift Cluster Management Guide.
- Create an Amazon Redshift cluster. If you already have an Amazon Redshift cluster,
skip this step.
- Prepare the data that you want to migrate in the Amazon Redshift cluster.
In this example, a TPC-H dataset is available in public schema. The dataset uses the MaxCompute V2.0 data types and the Decimal 2.0 data type.
- Create an Amazon Redshift cluster. If you already have an Amazon Redshift cluster,
skip this step.
- Prepare a MaxCompute project.
For more information, see Prepare.
In this example, a MaxCompute project is created as the migration destination in the Singapore (Singapore) region. The project is created in MaxCompute V2.0 because the TPC-H dataset uses the MaxCompute V2.0 data types and the Decimal V2.0 data type. - Activate Alibaba Cloud Object Storage Service (OSS).
For more information, see Activate OSS.
Background information

No. | Description |
---|---|
① | Unload the data from Amazon Redshift to a data lake on Amazon Simple Storage Service (S3). |
② | Migrate the data from Amazon S3 to an OSS bucket by using the Data Online Migration service of OSS. |
③ | Migrate the data from the OSS bucket to a MaxCompute project in the same region, and then verify the integrity and accuracy of the migrated data. |
Step 1: Unload data from Amazon Redshift to Amazon S3
Amazon Redshift supports authentication based on Identity and Access Management (IAM) roles and temporary security credentials (AccessKey pairs). You can use the UNLOAD command of Amazon Redshift to unload data to Amazon S3 based on these two authentication methods. For more information, see Unloading data.
- UNLOAD command based on an IAM role
-- Run the UNLOAD command to unload data from the customer table to Amazon S3. UNLOAD ('SELECT * FROM customer') TO 's3://bucket_name/unload_from_redshift/customer/customer_' -- The Amazon S3 bucket. IAM_ROLE 'arn:aws:iam::****:role/MyRedshiftRole'; -- The Alibaba Cloud Resource Name (ARN) of the IAM role.
- UNLOAD command based on an AccessKey pair
-- Run the UNLOAD command to unload data in the customer table to Amazon S3. UNLOAD ('SELECT * FROM customer') TO 's3://bucket_name/unload_from_redshift/customer/customer_' -- The Amazon S3 bucket. Access_Key_id '<access-key-id>' -- The AccessKey ID of the IAM user. Secret_Access_Key '<secret-access-key>' -- The AccessKey secret of the IAM user. Session_Token '<temporary-token>'; -- The temporary access token of the IAM user.
- Default format
The following sample command shows how to unload data in the default format:
After the command is run, data is unloaded to text files in which values are separated by vertical bars (|). You can log on to the Amazon S3 console and view the unloaded text files in the specified bucket.UNLOAD ('SELECT * FROM customer') TO 's3://bucket_name/unload_from_redshift/customer/customer_' IAM_ROLE 'arn:aws:iam::****:role/redshift_s3_role';
The following figure shows the unloaded text files in the default format.
- Apache Parquet format
Data unloaded in the Apache Parquet format can be directly read by other engines. The following sample command shows how to unload data in the Apache Parquet format:
After the command is run, you can view the unloaded Parquet files in the specified bucket. Parquet files are smaller than text files and have a higher data compression ratio.UNLOAD ('SELECT * FROM customer') TO 's3://bucket_name/unload_from_redshift/customer_parquet/customer_' FORMAT AS PARQUET IAM_ROLE 'arn:aws:iam::xxxx:role/redshift_s3_role';
This section describes how to authenticate requests based on IAM roles and unload data in the Apache Parquet format.
Step 2: Migrate the unloaded data from Amazon S3 to OSS
In MaxCompute, you can use the Data Online Migration service of OSS to migrate data from Amazon S3 to OSS. For more information, see Migrate data from Amazon Simple Storage Service (Amazon S3) to OSS. The Data Online Migration service is in public preview. Before you use this service, you must submit a ticket to contact the Customer Service to activate the service.
Step 3: Migrate data from the OSS bucket to the MaxCompute project in the same region
You can run the LOAD command of MaxCompute to migrate data from an OSS bucket to a MaxCompute project in the same region.
The LOAD command supports Security Token Service (STS) and AccessKey for authentication. If you use AccessKey for authentication, you must provide the AccessKey ID and AccessKey secret of your account in plaintext. STS authentication is highly secure because it does not expose the AccessKey pair. In this section, STS authentication is used as an example to show how to migrate data.