This topic describes how to establish a seamless connection between Tablestore and MaxCompute within the same Alibaba Cloud account.
Background information
MaxCompute is a cost-effective and fully managed platform for petabytes of data warehousing. You can use this service to process and analyze large amounts of data in a fast and efficient manner. You can execute a simple Data Definition Language (DDL) statement to create an external table in MaxCompute. Then, you can use this table to associate MaxCompute with external data sources. This allows access to and output of data in various formats. MaxCompute tables can contain only structured data. However, external tables can contain either structured or non-structured data.
Tablestore | MaxCompute |
---|---|
STRING | STRING |
INTEGER | BIGINT |
DOUBLE | DOUBLE |
BOOLEAN | BOOLEAN |
BINARY | BINARY |
Preparations
Before you use MaxCompute to access Tablestore, make sure that the following preparations are made:
Step 1: Install the client
Step 2. Create an external table
Create a MaxCompute data table ots_vehicle_track and associate it with a Tablestore table vehicle_track.
Information of the associated Tablestore table:
- Instance name: cap1
- Table name: vehicle_track
- Primary key information: vid(int) and gt(int)
- Endpoint:
https://cap1.cn-hangzhou.ots-internal.aliyuncs.com
CREATE EXTERNAL TABLE IF NOT EXISTS ots_vehicle_track
(
vid bigint,
gt bigint,
longitude double,
latitude double,
distance double ,
speed double,
oil_consumption double
)
STORED BY 'com.aliyun.odps.TableStoreStorageHandler' -- (1)
WITH SERDEPROPERTIES ( -- (2)
'tablestore.columns.mapping'=':vid, :gt, longitude, latitude, distance, speed, oil_consumption', -- (3)
'tablestore.table.name'='vehicle_track' -- (4)
)
LOCATION 'tablestore://cap1.cn-hangzhou.ots-internal.aliyuncs.com'; -- (5)
The following table describes the parameters.
Label | Parameter | Description |
---|---|---|
(1) | com.aliyun.odps.TableStoreStorageHandler | StorageHandler built in MaxCompute. StorageHandler processes Tablestore data. StorageHandler defines the interaction between MaxCompute and Tablestore. MaxCompute implements related logic. |
(2) | SERDEPROPERITES | The operation that provides parameter options. The tablestore.columns.mapping and tablestore.table.name parameters are required for TableStoreStorageHandler. |
(3) | tablestore.columns.mapping | Required. The columns of the Tablestore table to be accessed by MaxCompute. The columns
include primary key columns and attribute columns. Columns that contain colons (: ) are the primary key columns of Tablestore. Examples: :vid and :gt . Other columns in the example are attribute columns. You must specify all primary
key columns of the table in Tablestore when you specify the mapping. You need only
to specify the attribute columns that MaxCompute accesses instead of specifying all
attribute columns.
|
(4) | tablestore.table.name | The name of the Tablestore table to be accessed. If the specified name of the Tablestore table does not exist, an error occurs. MaxCompute does not create Tablestore tables proactively. |
(5) | LOCATION | The information of the Tablestore instance to be accessed. The information includes the name of the instance and endpoint. |
Step 3: Use an external table to access Tablestore data
After you create an external table, Tablestore data is imported to MaxCompute. Then, you can use MaxCompute SQL commands to access Tablestore data.
// Collect statistics for the average speed and fuel consumption of vehicles whose VIDs are smaller than 4 before the timestamp 1469171387.
select vid,count(*),avg(speed),avg(oil_consumption) from ots_vehicle_track where vid <4 and gt<1469171387 group by vid;
A similar output is returned.
