Create a MaxCompute external table to query and analyze data in OSS - Object Storage Service

MaxCompute's external table feature lets you run SQL queries directly against CSV files stored in Object Storage Service (OSS) — no data migration required. MaxCompute can analyze large volumes of OSS data within several minutes, making it a low-cost option for structured analysis on IoT logs and other large text datasets.

In this tutorial:

Prerequisites

Before you begin, ensure that you have:

An OSS bucket. See Create a bucket
MaxCompute authorized to access OSS. Log in to the RAM console with your Alibaba Cloud account, then go to the Cloud Resource Access Authorization page to grant the AliyunODPSDefaultRole role
A MaxCompute project. See Create a MaxCompute project
The MaxCompute client installed and configured. See Install and configure the MaxCompute client

How it works

Upload source data (CSV files) to an OSS bucket.
Create an external table in MaxCompute that maps to the OSS directory containing those files. The LOCATION clause points to a directory in OSS — MaxCompute scans all files in that directory when the table is queried.
Run SQL queries against the external table. MaxCompute reads the data from OSS at query time.

Query OSS data with an external table

The following steps use a sample IoT vehicle dataset to walk through the end-to-end process.

Step 1: Upload data to OSS

Create a file named vehicle.csv on your local computer with the following content:

   1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
   1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
   1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
   1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
   1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
   1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
   1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
   1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
   1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
   1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N

Each row contains eight comma-separated fields: vehicleId, recordId, patientId, calls, locationLatitute, locationLongtitue, recordTime, direction.

Upload vehicle.csv to the demo/ directory of the examplebucket bucket in the China (Hangzhou) region. See Upload objects.

Step 2: Start the MaxCompute client

Start the MaxCompute client. See Run the MaxCompute client.

Step 3: Create an external table

Create a non-partitioned external table named data_csv_external that maps to the demo/ directory in OSS:

CREATE EXTERNAL TABLE IF NOT EXISTS data_csv_external
(
    vehicleId int,
    recordId int,
    patientId int,
    calls int,
    locationLatitute double,
    locationLongtitue double,
    recordTime string,
    direction string
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/examplebucket/demo/';

Step 4: Query the external table

Run the following SQL statement to retrieve records where patientId is greater than 25:

select recordId, patientId, direction from data_csv_external where patientId > 25;

Expected output:

+------------+------------+-----------+
| recordId   | patientId  | direction |
+------------+------------+-----------+
| 1          | 51         | S         |
| 3          | 48         | NE        |
| 4          | 30         | W         |
| 5          | 47         | S         |
| 7          | 53         | N         |
| 8          | 63         | SW        |
| 10         | 31         | N         |
+------------+------------+-----------+

FAQ

I get "You don't exist in project `<projectname>`" when connecting

The Alibaba Cloud account or RAM user whose AccessKey pair is used hasn't been added to the MaxCompute project. Contact the project owner to add the account. See Add an Alibaba Cloud account (project-level) and Add a RAM user (project-level).

The full error code is: ODPS-0420095: Access Denied - Authorization Failed [4002].

What's next

Explore more file formats and table types: Create an OSS external table that maps to other formats such as ORC. See Create an OSS external table.
Export query results: Download query results from MaxCompute to your local computer. See Execute SQL statements and export the result data.
Clean up resources: Delete the external table or the MaxCompute project when no longer needed to avoid unnecessary charges. See Delete a table or a MaxCompute project.