This topic describes how to create an Object Storage Service (OSS) data warehouse (a schema) by using a wizard. The data warehouse is used to synchronize data from ApsaraDB RDS or a self-managed database hosted on an Elastic Compute Service (ECS) instance to OSS.
- Log on to the Data Lake Analytics console.
- In the upper-left corner of the page that appears, select the region where Data Lake Analytics (DLA) is deployed.
- In the left-side navigation pane, choose Data Lake Management > Data into the lake.
- On the Data into the lake page, click Go To the Wizard in the One-Click DW section.
- Complete authorization as prompted and click Next.
To allow DLA to access OSS and ApsaraDB RDS, you must grant the read-only permissions of OSS and ApsaraDB RDS to DLA. You need to perform authorization once only.
- Configure the parameters as prompted.Note You can synchronize data from ApsaraDB RDS or a self-managed database hosted on an ECS instance to OSS based on the storage method of your business data.
Tab or section Parameter Description Cloud RDS Type The ApsaraDB RDS data source.
Click the option button next to an ApsaraDB RDS instance to add the instance to the Source Data section.
Instance custom name The name of the ApsaraDB RDS instance. Instance ID The ID of the ApsaraDB RDS instance. The system automatically obtains the ApsaraDB RDS instance that resides in the same region as DLA.
Fuzzy search for ApsaraDB RDS instances is supported.
Self Built Database ECS ID The ID of the ECS instance that hosts your self-managed database.
Note: For a self-managed database hosted on an ECS instance, you must add the reverse access CIDR block 100.104.0.0/16 to a whitelist of the ECS instance.
VPC ID The ID of the VPC to which the ECS instance belongs. Engine The type of the self-managed database hosted on an ECS instance. Source Data Server The server of the ApsaraDB RDS or self-managed database hosted on an ECS instance, which is used as the data source for one-click data warehousing. Port The port that is used to connect to the ApsaraDB RDS instance or self-managed database hosted on an ECS instance. User Name The database username that is used to connect to the ApsaraDB RDS instance or self-managed database hosted on an ECS instance. Password The password of the username. Schema Name The database name of the ApsaraDB RDS instance or self-managed database hosted on an ECS instance.
After you configure the data source, click Test Connection to test connectivity.
Position Opening Configuration Schema Name The database name of the ApsaraDB RDS instance or self-managed database hosted on an ECS instance in the DLA console. Location The detailed location where data of the ApsaraDB RDS instance or self-managed database hosted on an ECS instance is stored when you create a data warehouse.
The system automatically obtains OSS buckets that reside in the same region as DLA. Click the Location field. In the Select OSS Path panel, select a bucket or object based on your business requirements.
When you use the one-click data warehousing feature, DLA must have the permissions to delete OSS data so that it can perform the extract, transform, and load (ETL) operations from OSS data to RDS data. For more information, see Authorize DLA to delete OSS files
Scheduling time The time that the system starts to synchronize data from an ApsaraDB RDS or a self-managed database hosted on an ECS instance to OSS.
The default value of Scheduling time is 00:30. To prevent your business from being affected during data synchronization, we recommend that you set this parameter to a time at off-peak hours.
Advanced Options Custom parameters, such as filter fields.
- After the preceding parameters are configured, click Create to create an OSS data warehouse.Note After you create a data warehouse, DLA automatically synchronizes data from the ApsaraDB RDS database or self-managed database hosted on an ECS instance to OSS at a specified time. At the same time, DLA creates a table schema that is the same as the ApsaraDB RDS database in OSS and creates an OSS table that corresponds to the OSS table in DLA.
When you create an OSS data warehouse, you must configure the data synchronization time in advance. DLA synchronizes data at the specified time. If you want to synchronize data immediately, DLA allows you to manually trigger data synchronization.