As a hub for querying data in Alibaba Cloud, Data Lake Analytics (DLA) has been upgraded to query data in ApsaraDB for RDS instances (
Postgres). This tutorial tells you how to use DLA to query data in ApsaraDB for RDS instances. We use
MySQL in the examples and point out the differences in
Postgres at the end of the tutorial.
The following describes the syntax used to create an external table whose underlying layer is mapped to
MySQL in DLA:
CREATE SCHEMA hello_mysql_vpc_rds WITH DBPROPERTIES ( CATALOG = 'mysql', LOCATION = 'jdbc:mysql://rm-2zer0vg58mfofake.mysql.rds.aliyuncs.com:3306/dla_test', USER = 'dla_test', PASSWORD = 'the-fake-password', VPC_ID = 'vpc-2zeij924vxd303kwifake', INSTANCE_ID = 'rm-2zer0vg58mfo5fake' );
Different from creating a common table, an external table requires two more attributes:
INSTANCE_ID. VPC_ID indicates the ID of the VPC instance where your ApsaraDB for RDS instance is located, as shown in the following figure.
INSTANCE_ID indicates the ID of your ApsaraDB for RDS instance, which is displayed on the details page of your ApsaraDB for RDS instance.
These attributes allow DLA to access the data in your ApsaraDB for RDS instance, which is inaccessible by default because your databases are stored in your VPC instance, by using Alibaba Cloud VPC reserve access technology.
Permission statements: If you create a database using the preceding method, you agree that Alibaba Cloud uses the VPC reserve access technology to read or write data from or to your ApsaraDB for RDS instance.
You also have to add the IP address segment
100.104.0.0/16 to your ApsaraDB for RDS whitelist. This IP address segment is used for VPC reserve access, as shown in the following figure.
After creating a database, create the table person in your ApsaraDB for RDS instance for test:
create table person ( id int, name varchar(1023), age int );
Insert the following test data:
insert into person values (1, 'james', 10), (2, 'bond', 20), (3, 'jack', 30), (4, 'lucy', 40);
You can create the corresponding mapping table in the DLA database:
create external table person ( id int, name varchar(1023), age int ) tblproperties ( table_mapping = "person" );
After connecting the MySQL client to the DLA database, you can query data in the MySQL database:
mysql> select * from person; +------+-------+------+ | id | name | age | +------+-------+------+ | 1 | james | 10 | | 2 | bond | 20 | | 3 | jack | 30 | | 4 | lucy | 40 | +------+-------+------+ 4 rows in set (0.35 sec)
DLA is not typically used to read data from an ApsaraDB for RDS instance for analysis because an ApsaraDB for RDS instance contains a limited amount of data and is not applicable for massive data analysis. Instead, DLA is often used to analyze massive data in OSS and Table Store instances and write back the result data to ApsaraDB for RDS instances for frontend business. This is easily implemented in DLA. The table
person is used as an example in the following statement to convert 10 records of
oss_db and insert them into the table
mysql> insert into hello_mysql_vpc_rds.person -> select c_custkey, c_name, c_custkey + 20 from oss_db.customer limit 10; +------+ | rows | +------+ | 10 | +------+ 1 row in set (4.57 sec) mysql> select * from person; +------+--------------------+------+ | id | name | age | +------+--------------------+------+ | 1 | james | 10 | | 2 | bond | 20 | | 3 | jack | 30 | | 4 | lucy | 40 | | 1 | Customer#000000001 | 21 | | 3 | Customer#000000003 | 23 | | 5 | Customer#000000005 | 25 | | 2 | Customer#000000002 | 22 | | 4 | Customer#000000004 | 24 | | 7 | Customer#000000007 | 27 | | 6 | Customer#000000006 | 26 | | 9 | Customer#000000009 | 29 | | 8 | Customer#000000008 | 28 | | 10 | Customer#000000010 | 30 | +------+--------------------+------+ 14 rows in set (0.26 sec)
PostgreSQL is similar to that of MySQL. As
MySQL are equivalent, they are at the same layer. However,
PostgreSQL are at two layers. Therefore, the methods for creating a database and a table are different:
CREATE SCHEMA `hello_sqlserver_vpc_rds` WITH DBPROPERTIES ( CATALOG = 'sqlserver', LOCATION = 'jdbc:sqlserver://rm-bp15g1r5jf90hfake.sqlserver.rds.aliyuncs.com:3433;DatabaseName=dla_test', USER='dla_test1', PASSWORD='this-is-not-a-real-password', INSTANCE_ID = 'rm-bp15g1r5jf90fake', VPC_ID = 'vpc-bp1adypqlcn535yrdfake' );
LOCATION. Different from the database name for
MySQL, the database name for SQLServer is specified using
DatabaseName=dla_test. This is defined by the SQLServer JDBC URL rather than DLA.
CREATE SCHEMA `hello_postgresql_vpc_rds` WITH DBPROPERTIES ( CATALOG = 'postgresql', LOCATION = 'jdbc:postgresql://rm-bp1oo49r6j3hvfake.pg.rds.aliyuncs.com:3433/dla_test', USER='dla_test', PASSWORD='this-is-not-a-real-password', INSTANCE_ID = 'rm-bp1oo49r6j3hfake', VPC_ID = 'vpc-bp1adypqlcn535yrfake' );
The method for creating a database for PostgreSQL is similar to that for MySQL, except setting
The difference for creating a table is in the
MySQL only contains the table name (
create external table person1 ( id int, name varchar(1023), age int ) tblproperties( table_mapping = 'person' );
PostgreSQL contains the names of the schema and table:
create external table person ( id int, name varchar(1023), age int ) tblproperties( table_mapping = 'public.person' );
That's it! To learn more about Alibaba Cloud Data Lake Analytics at www.alibabacloud.com/products/data-lake-analytics
Alibaba Clouder - July 5, 2019
Alibaba Clouder - November 12, 2018
Alibaba EMR - October 12, 2021
Alibaba Clouder - July 4, 2019
Alibaba Clouder - February 7, 2018
ApsaraDB - February 16, 2021
A Big Data service that uses Apache Hadoop and Spark to process and analyze dataLearn More
A fully managed NoSQL cloud database service that enables storage of massive amount of structured and semi-structured dataLearn More
An encrypted and secure cloud storage service which stores, processes and accesses massive amounts of data from anywhere in the worldLearn More
More Posts by Alibaba Clouder