MaxCompute's external table feature lets you run SQL queries directly against CSV files stored in Object Storage Service (OSS) — no data migration required. MaxCompute can analyze large volumes of OSS data within several minutes, making it a low-cost option for structured analysis on IoT logs and other large text datasets.
In this tutorial:
Prerequisites
Before you begin, ensure that you have:
An OSS bucket. See Create a bucket
MaxCompute authorized to access OSS. Log in to the RAM console with your Alibaba Cloud account, then go to the Cloud Resource Access Authorization page to grant the
AliyunODPSDefaultRoleroleA MaxCompute project. See Create a MaxCompute project
The MaxCompute client installed and configured. See Install and configure the MaxCompute client
How it works
Upload source data (CSV files) to an OSS bucket.
Create an external table in MaxCompute that maps to the OSS directory containing those files. The
LOCATIONclause points to a directory in OSS — MaxCompute scans all files in that directory when the table is queried.Run SQL queries against the external table. MaxCompute reads the data from OSS at query time.
Query OSS data with an external table
The following steps use a sample IoT vehicle dataset to walk through the end-to-end process.
Step 1: Upload data to OSS
Create a file named
vehicle.csvon your local computer with the following content:1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S 1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE 1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE 1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W 1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S 1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S 1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N 1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW 1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE 1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,NEach row contains eight comma-separated fields:
vehicleId,recordId,patientId,calls,locationLatitute,locationLongtitue,recordTime,direction.Upload
vehicle.csvto thedemo/directory of theexamplebucketbucket in the China (Hangzhou) region. See Upload objects.
Step 2: Start the MaxCompute client
Start the MaxCompute client. See Run the MaxCompute client.
Step 3: Create an external table
Create a non-partitioned external table named data_csv_external that maps to the demo/ directory in OSS:
CREATE EXTERNAL TABLE IF NOT EXISTS data_csv_external
(
vehicleId int,
recordId int,
patientId int,
calls int,
locationLatitute double,
locationLongtitue double,
recordTime string,
direction string
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/examplebucket/demo/';Step 4: Query the external table
Run the following SQL statement to retrieve records where patientId is greater than 25:
select recordId, patientId, direction from data_csv_external where patientId > 25;Expected output:
+------------+------------+-----------+
| recordId | patientId | direction |
+------------+------------+-----------+
| 1 | 51 | S |
| 3 | 48 | NE |
| 4 | 30 | W |
| 5 | 47 | S |
| 7 | 53 | N |
| 8 | 63 | SW |
| 10 | 31 | N |
+------------+------------+-----------+FAQ
I get "You don't exist in project <projectname>" when connecting
The Alibaba Cloud account or RAM user whose AccessKey pair is used hasn't been added to the MaxCompute project. Contact the project owner to add the account. See Add an Alibaba Cloud account (project-level) and Add a RAM user (project-level).
The full error code is: ODPS-0420095: Access Denied - Authorization Failed [4002].
What's next
Explore more file formats and table types: Create an OSS external table that maps to other formats such as ORC. See Create an OSS external table.
Export query results: Download query results from MaxCompute to your local computer. See Execute SQL statements and export the result data.
Clean up resources: Delete the external table or the MaxCompute project when no longer needed to avoid unnecessary charges. See Delete a table or a MaxCompute project.