×
Community Blog Querying ApsaraDB for RDS with Data Lake Analytics

Querying ApsaraDB for RDS with Data Lake Analytics

In this tutorial, we will discover how to use Alibaba Cloud Data Lake Analytics to read/write data from/to ApsaraDB for RDS instances.

As a hub for querying data in Alibaba Cloud, Data Lake Analytics (DLA) has been upgraded to query data in ApsaraDB for RDS instances (using MySQL, SQLServer, and Postgres). This tutorial tells you how to use DLA to query data in ApsaraDB for RDS instances. We use MySQL in the examples and point out the differences in SQLServer and Postgres at the end of the tutorial.

Create a Database

The following describes the syntax used to create an external table whose underlying layer is mapped to MySQL in DLA:

CREATE SCHEMA hello_mysql_vpc_rds WITH DBPROPERTIES (
  CATALOG = 'mysql', 
  LOCATION = 'jdbc:mysql://rm-2zer0vg58mfofake.mysql.rds.aliyuncs.com:3306/dla_test',
  USER = 'dla_test',
  PASSWORD = 'the-fake-password',
  VPC_ID = 'vpc-2zeij924vxd303kwifake',
  INSTANCE_ID = 'rm-2zer0vg58mfo5fake'
);

Different from creating a common table, an external table requires two more attributes: VPC_ID and INSTANCE_ID. VPC_ID indicates the ID of the VPC instance where your ApsaraDB for RDS instance is located, as shown in the following figure.

1

INSTANCE_ID indicates the ID of your ApsaraDB for RDS instance, which is displayed on the details page of your ApsaraDB for RDS instance.

2

These attributes allow DLA to access the data in your ApsaraDB for RDS instance, which is inaccessible by default because your databases are stored in your VPC instance, by using Alibaba Cloud VPC reserve access technology.

Permission statements: If you create a database using the preceding method, you agree that Alibaba Cloud uses the VPC reserve access technology to read or write data from or to your ApsaraDB for RDS instance.

You also have to add the IP address segment 100.104.0.0/16 to your ApsaraDB for RDS whitelist. This IP address segment is used for VPC reserve access, as shown in the following figure.

3

Create a Table

After creating a database, create the table person in your ApsaraDB for RDS instance for test:

create table person (
       id int,
       name varchar(1023),
       age int
);

Insert the following test data:

insert into person 
  values (1, 'james', 10), 
         (2, 'bond', 20), 
         (3, 'jack', 30), 
         (4, 'lucy', 40);

You can create the corresponding mapping table in the DLA database:

create external table person (
       id int,
       name varchar(1023),
       age int
) tblproperties (
  table_mapping = "person"
);

After connecting the MySQL client to the DLA database, you can query data in the MySQL database:

mysql> select * from person;
+------+-------+------+
| id   | name  | age  |
+------+-------+------+
|    1 | james |   10 |
|    2 | bond  |   20 |
|    3 | jack  |   30 |
|    4 | lucy  |   40 |
+------+-------+------+
4 rows in set (0.35 sec)

ETL: Clean Data from OSS and Write it to ApsaraDB for RDS

DLA is not typically used to read data from an ApsaraDB for RDS instance for analysis because an ApsaraDB for RDS instance contains a limited amount of data and is not applicable for massive data analysis. Instead, DLA is often used to analyze massive data in OSS and Table Store instances and write back the result data to ApsaraDB for RDS instances for frontend business. This is easily implemented in DLA. The table person is used as an example in the following statement to convert 10 records of customer in oss_db and insert them into the table hello_mysql_vps_rds.person:

mysql> insert into hello_mysql_vpc_rds.person
    -> select c_custkey, c_name, c_custkey + 20 from oss_db.customer limit 10;

+------+
| rows |
+------+
|   10 |
+------+
1 row in set (4.57 sec)

mysql> select * from person;
+------+--------------------+------+
| id   | name               | age  |
+------+--------------------+------+
|    1 | james              |   10 |
|    2 | bond               |   20 |
|    3 | jack               |   30 |
|    4 | lucy               |   40 |
|    1 | Customer#000000001 |   21 |
|    3 | Customer#000000003 |   23 |
|    5 | Customer#000000005 |   25 |
|    2 | Customer#000000002 |   22 |
|    4 | Customer#000000004 |   24 |
|    7 | Customer#000000007 |   27 |
|    6 | Customer#000000006 |   26 |
|    9 | Customer#000000009 |   29 |
|    8 | Customer#000000008 |   28 |
|   10 | Customer#000000010 |   30 |
+------+--------------------+------+
14 rows in set (0.26 sec)

SQL Server and PostgreSQL

Usage of SQLServer and PostgreSQL is similar to that of MySQL. As database and schema in MySQL are equivalent, they are at the same layer. However, database and schema in SQLServer and PostgreSQL are at two layers. Therefore, the methods for creating a database and a table are different:

Create a Database

SQL Server

CREATE SCHEMA `hello_sqlserver_vpc_rds` WITH DBPROPERTIES 
( 
  CATALOG = 'sqlserver', 
  LOCATION = 'jdbc:sqlserver://rm-bp15g1r5jf90hfake.sqlserver.rds.aliyuncs.com:3433;DatabaseName=dla_test',
  USER='dla_test1',
  PASSWORD='this-is-not-a-real-password',
  INSTANCE_ID = 'rm-bp15g1r5jf90fake',
  VPC_ID = 'vpc-bp1adypqlcn535yrdfake'
);

  1. Set CATALOG to sqlserver.
  2. Specify the database name for LOCATION. Different from the database name for MySQL, the database name for SQLServer is specified using DatabaseName=dla_test. This is defined by the SQLServer JDBC URL rather than DLA.

PostgreSQL

CREATE SCHEMA `hello_postgresql_vpc_rds` WITH DBPROPERTIES 
( 
  CATALOG = 'postgresql', 
  LOCATION = 'jdbc:postgresql://rm-bp1oo49r6j3hvfake.pg.rds.aliyuncs.com:3433/dla_test',
  USER='dla_test',
  PASSWORD='this-is-not-a-real-password',
  INSTANCE_ID = 'rm-bp1oo49r6j3hfake',
  VPC_ID = 'vpc-bp1adypqlcn535yrfake'
);

The method for creating a database for PostgreSQL is similar to that for MySQL, except setting CATALOG to postgresql.

Create a Table

The difference for creating a table is in the table_mapping field. table_mapping of MySQL only contains the table name (person):

create external table person1 (
       id int,
       name varchar(1023),
       age int
) tblproperties(
  table_mapping = 'person'
);

While table_mapping of SQLServer or PostgreSQL contains the names of the schema and table:

create external table person (
       id int,
       name varchar(1023),
       age int
) tblproperties(
  table_mapping = 'public.person'
);

That's it! To learn more about Alibaba Cloud Data Lake Analytics at www.alibabacloud.com/products/data-lake-analytics

0 0 0
Share on

Alibaba Clouder

2,605 posts | 747 followers

You may also like

Comments