×
Community Blog Using Data Lake Analytics to Analyze Data in Table Store Instances

Using Data Lake Analytics to Analyze Data in Table Store Instances

In this article, we will learn how to analyze data stored in Alibaba Cloud Table Store with the newly launched Data Lake Analytics (DLA) service.

Data lake is a popular concept in the big data industry that involves storing files in its native format. With a data lake, you can implement big data association analysis across various heterogeneous data sources without any pre-processes such as ETL and data migration, significantly reducing costs and improving user experience.

Alibaba Cloud has just launched its own data lake analysis service, which is called Data Lake Analytics (DLA).

One of the biggest advantages of using Alibaba Cloud Data Lake Analytics (DLA) is that you can use it to analyze the data in your Table Store instance. We will show you how in this article.

Data Lake Analytics vs. Table Store

Alibaba Cloud Table Store is a fully managed NoSQL cloud database service that enables storage of a massive amount of structured and semi-structured data. Although conceptually similar, Table Store differs from DLA in the following aspects:

  1. DLA is an SQL execution engine in the upper-layer MPP architecture. DLA APIs are implemented by using the standard MySQL syntax, to meet the real-time OLAP analysis requirement.
  2. Table Store is an underlying data storage engine implemented based on LSM. It has the design and implementation that are similar to those of HBase and BigTable.
  3. DLA supports real-time analysis of data in one or more supported storage engines such as Table Store, OSS, ADS, and RDS.
  4. When performing a query task, DLA queries the underlying data and participates in the upper-layer analysis through the Table Store core API.
  5. DLA is case insensitive, whereas Table Store is case sensitive.

Metadata Mapping Logic between Table Store and DLA

Mapping of concepts such as the database and table

1

Field mapping

2

Setting Up a Table Store Instance

To enable Table Store, perform the following operations:

  1. Activate the Table Store service and buy a Table Store instance. For more information, visit https://www.alibabacloud.com/help/doc-detail/27293.htm
  2. Go to the Table Store console, select an appropriate region, and create an instance and a table.
  3. Alternatively, you can use an SDK to create a table and write data. For more information, visit https://www.alibabacloud.com/help/doc-detail/43005.htm

Obtaining Table Store Key Information

The following takes our test data as an example to explain the entire process (with the detailed application procedure omitted):

  1. Check the current regions where DLA has been activated, and ensure that DLA has been activated in the region of your Table Store instance. For this example, we have set up our Table Store instances in Hangzhou region.
  2. Go to the Table Store console, select China (Hangzhou), and view the instances. These instances are the standard test set generated by TPC-H and contain eight tables. The database tables have been created in advance, and the SDK has been used to write data into these tables.

    3

  3. View the instance information. Related endpoints are displayed. DLA currently does not support public network access. Therefore, you need to select the private network endpoint. In this example, the hz-tpch-1x-vol instance is selected for the test.

    4

    5

  4. View the definitions (including the table name, primary key name, primary key type, and primary key sequence) and data of the nation table, and use the information for subsequent comparison:

    6


Activating the DLA Account

Go to the product page at www.alibabacloud.com/products/data-lake-analytics, and click Activate Now to submit a public Beta test application. Alibaba Cloud will review your application as soon as possible.

7

8

  1. After your account is activated, you will receive a short message, an insite email, or an email that contains the account information. (The content template may be upgraded.)

    9

  2. Obtain the following connection information on the Data Lake Analytics console:

    10

  3. The following provides the methods for connecting to the DLA Hangzhou cluster using MySQL and JDBC:
    MySQL command line:
    mysql -hservice.cn-hangzhou.datalakeanalytics.aliyuncs.com -P10000 -u<dla_username> -p<dla_password> -c -A
    
    JDBC URL:
    jdbc:mysql://service.cn-hangzhou.datalakeanalytics.aliyuncs.com:10000/
    username=<dla_username>
    password=<dla_password>


Network Connectivity between DLA and Table Store

Currently, VPC-related policies have been configured and allow you to directly connect DLA to Table Store. Therefore, you do not need to worry about network connectivity between DLA and Table Store. However, as DLA currently does not support public network access, you must use the private network endpoint of Table Store.

Using DLA to Connect to Table Store

Note: DLA is a multi-tenant scenario. Therefore, a new user cannot see any database table when accessing the DLA console for the first time.

  1. Create your own DLA database based on the information obtained in the preceding steps:
    mysql> create database hangzhou_ots_test with dbproperties (
      catalog = 'ots',
      location = 'https://hz-tpch-1x-vol.cn-hangzhou.ots-internal.aliyuncs.com',
      instance = 'hz-tpch-1x-vol'
    );
    
    Query OK, 0 rows affected (0.23 sec)
    
    #hangzhou_ots_test             ---Pay attention to the database name, which allows only letters, digits, and underlines
    #catalog = 'ots',              ---Set this parameter to ots to distinguish it from other data sources, such as OSS and RDS
    #location = 'https://xxx'      ---Set this parameter to the Table Store endpoint, which can be learned from the instance information
    #instance = 'hz-tpch-1x-vol'   ---Specify the instance name because the endpoint may not contain the instance name. The instance name is mapped to a schema on DLA

  2. View the created database.
    mysql> show databases;
    +------------------------------+
    | Database                     |
    +------------------------------+
    | hangzhou_ots_test            |
    +------------------------------+
    1 rows in set (0.22 sec)
    
    mysql> show create database hangzhou_ots_test;
    +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Database          | Create Database                                                                                                                                                                  |
    +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | hangzhou_ots_test | CREATE DATABASE `hangzhou_ots_test`
    WITH DBPROPERTIES (
        CATALOG = 'ots',
        LOCATION = 'https://hz-tpch-1x-vol.cn-hangzhou.ots-internal.aliyuncs.com',
        INSTANCE = 'hz-tpch-1x-vol'
    ) |
    +-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set (0.31 sec)

  3. View your own DLA tables.
    mysql> use hangzhou_ots_test;
    Database changed
    
    mysql> show tables;
    Empty set (0.30 sec)

  4. Create a DLA table, and map it to a table in the Table Store instance.
    mysql> CREATE EXTERNAL TABLE `nation` (
      `N_NATIONKEY` int not NULL ,
      `N_COMMENT` varchar(100) NULL ,
      `N_NAME` varchar(100) NULL ,
      `N_REGIONKEY` int NULL ,
      PRIMARY KEY (`N_NATIONKEY`)
    );
    Query OK, 0 rows affected (0.36 sec)
    
    ## `N_NATIONKEY` int not NULL   ---- You must set this parameter to not NULL for a primary key
    ## PRIMARY KEY (`N_NATIONKEY`)  ---- The setting must be the same as the primary key sequence in Table Store, and the name must also be consistent

  5. View the table created by yourself and the related DDL statement.
    mysql> show tables;
    +------------+
    | Table_Name |
    +------------+
    | nation     |
    +------------+
    1 row in set (0.35 sec)
    
    mysql> show create table nation;
    +--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Table  | Create Table                                                                                                                                                                                                                                                                                                                                                         |
    +--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | nation | CREATE EXTERNAL TABLE `nation` (
      `n_nationkey` int NULL COMMENT '',
      `n_comment` varchar(100) NULL COMMENT '',
      `n_name` varchar(100) NULL COMMENT '',
      `n_regionkey` int NULL COMMENT '',
      PRIMARY KEY (`n_nationkey`)
    )
    TBLPROPERTIES (COLUMN_MAPPING = 'n_nationkey,N_NATIONKEY; n_comment,N_COMMENT; n_name,N_NAME; n_regionkey,N_REGIONKEY; ')
    COMMENT '' |
    +--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set (0.30 sec)

  6. Start to query and analyze data. The query in this example is not complex. You can analyze your own data by observing the MySQL syntax.
    mysql> select count(*) from nation;
    +-------+
    | _col0 |
    +-------+
    |    25 |
    +-------+
    1 row in set (1.19 sec)
    
    mysql> select * from nation;
    +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
    | n_nationkey | n_comment                                                                                                          | n_name         | n_regionkey |
    +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
    |           0 |  haggle. carefully final deposits detect slyly agai                                                                | ALGERIA        |           0 |
    |           1 | al foxes promise slyly according to the regular accounts. bold requests alon                                       | ARGENTINA      |           1 |
    |           2 | y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special         | BRAZIL         |           1 |
    |           3 | eas hang ironic, silent packages. slyly regular packages are furiously over the tithes. fluffily bold              | CANADA         |           1 |
    |           4 | y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d                | EGYPT          |           4 |
    |           5 | ven packages wake quickly. regu                                                                                    | ETHIOPIA       |           0 |
    |           6 | refully final requests. regular, ironi                                                                             | FRANCE         |           3 |
    |           7 | l platelets. regular accounts x-ray: unusual, regular acco                                                         | GERMANY        |           3 |
    |           8 | ss excuses cajole slyly across the packages. deposits print aroun                                                  | INDIA          |           2 |
    |           9 |  slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull | INDONESIA      |           2 |
    |          10 | efully alongside of the slyly final dependencies.                                                                  | IRAN           |           4 |
    |          11 | nic deposits boost atop the quickly final requests? quickly regula                                                 | IRAQ           |           4 |
    |          12 | ously. final, express gifts cajole a                                                                               | JAPAN          |           2 |
    |          13 | ic deposits are blithely about the carefully regular pa                                                            | JORDAN         |           4 |
    |          14 |  pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t                      | KENYA          |           0 |
    |          15 | rns. blithely bold courts among the closely regular packages use furiously bold platelets?                         | MOROCCO        |           0 |
    |          16 | s. ironic, unusual asymptotes wake blithely r                                                                      | MOZAMBIQUE     |           0 |
    |          17 | platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun         | PERU           |           1 |
    |          18 | c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos                        | CHINA          |           2 |
    |          19 | ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account    | ROMANIA        |           3 |
    |          20 | ts. silent requests haggle. closely express packages sleep across the blithely                                     | SAUDI ARABIA   |           4 |
    |          21 | hely enticingly express accounts. even, final                                                                      | VIETNAM        |           2 |
    |          22 |  requests against the platelets use never according to the quickly regular pint                                    | RUSSIA         |           3 |
    |          23 | eans boost carefully special requests. accounts are. carefull                                                      | UNITED KINGDOM |           3 |
    |          24 | y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be     | UNITED STATES  |           1 |
    +-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
    25 rows in set (1.63 sec)


Based on IDs in the following figure, you can see that the query result is the same as data in Table Store.

11

References


  1. Data Lake Analytics application scenarios: https://www.alibabacloud.com/help/doc-detail/70380.htm
  2. Use Data Lake Analytics to analyze data in OSS
  3. Database connection method of Data Lake Analytics
1 1 1
Share on

Alibaba Clouder

2,605 posts | 747 followers

You may also like

Comments

Raja_KT March 1, 2019 at 8:01 am

Interesting one.... "DLA is case insensitive, whereas Table Store is case sensitive."