Tables are the fundamental unit for storing data in MaxCompute. This tutorial walks you through designing and creating two types of tables—non-partitioned and partitioned—using sample banking data, then verifying the results.
What you'll learn:
-
How to analyze a data file and determine the right table structure
-
How to create non-partitioned and partitioned tables using the MaxCompute client
-
How to add partitions to a partitioned table
-
How to verify that tables were created with the correct schema
Steps overview: Step 1: Design the tables → Step 2: Create the tables → Step 3: Verify the tables
Prerequisites
Before you begin, make sure you have:
-
The MaxCompute client installed and running. See Install and start the MaxCompute client.
-
A CSV or TXT data file to import. This tutorial uses the following sample files:
-
Non-partitioned table: banking.txt
-
Partitioned table (three files): banking_nocreditcard.csv, banking_uncreditcard.csv, and banking_yescreditcard.csv
-
-
The CREATE TABLE permission on your MaxCompute project. This is required for RAM users. Alibaba Cloud account holders have this permission by default.
-
To check your permissions, see Query permissions by using MaxCompute SQL.
-
To grant permissions to a RAM user, see MaxCompute permissions.
-
Usage notes
The column data types in your table must match the data types in your source files. Type mismatches prevent data from being imported.
Step 1: Design the tables
Before writing any SQL, examine your source data to determine the number of columns, field names, and data types for each column.
The sample files in this tutorial contain mortgagor data. Fields are comma-separated. The following preview shows a few rows from each file:
-- Non-partitioned table data (21 columns)
44,blue-collar,married,basic.4y,unknown,yes,no,cellular,aug,thu,210,1,999,0,nonexistent,1.4,93.444,-36.1,4.963,5228.1,0
53,technician,married,unknown,no,no,no,cellular,nov,fri,138,1,999,0,nonexistent,-0.1,93.2,-42,4.021,5195.8,0
28,management,single,university.degree,no,yes,no,cellular,jun,thu,339,3,6,2,success,-1.7,94.055,-39.8,0.729,4991.6,1
-- Partitioned table data (20 columns per file; credit column is used as partition key)
53,technician,married,unknown,no,no,cellular,nov,fri,138,1,999,0,nonexistent,-0.1,93.2,-42,4.021,5195.8,0
28,management,single,university.degree,yes,no,cellular,jun,thu,339,3,6,2,success,-1.7,94.055,-39.8,0.729,4991.6,1
39,services,married,high.school,no,no,cellular,apr,fri,185,2,999,0,nonexistent,-1.8,93.075,-47.1,1.405,5099.1,0
Count the columns, identify the data type of each field, and choose meaningful column names. The following table summarizes the analysis for the sample files.
<table> <thead> <tr> <td><p><b>Item</b></p></td> <td><p><b>Description</b></p></td> </tr> </thead> <colgroup></colgroup> <colgroup></colgroup> <tbody> <tr> <td><p>Number of columns</p></td> <td><p>Count the number of columns in each sample file and determine the number of columns in the table that you want to create:</p> <ul> <li><p>The sample file that is used to create a non-partitioned table contains 21 columns. You can create a non-partitioned table that contains 21 columns based on the sample file. </p> <div> <div> <i></i> </div> <div> <strong>Note </strong> <p>If the number of columns in the table exceeds the number of columns in the sample file, the additional columns in the table are filled with NULL. If the number of columns in the table is less than the number of columns in the sample file, the excessive data in the sample file is discarded. </p> </div> </div></li> <li><p>Each of the sample files that are used to create a partitioned table contains 20 columns. You can create a partitioned table that contains 21 columns with one column as the partition key based on the sample files. You must define partitions when you create the table. </p></li> </ul></td> </tr> <tr> <td><p>Fields of the table</p></td> <td><p>Identify the fields in each sample file and determine the fields in the table that you want to create:</p> <ul> <li><p>Fields of the non-partitioned table:</p> <ul> <li><p>age: age</p></li> <li><p>job: job type</p></li> <li><p>marital: marital status</p></li> <li><p>education: educational level</p></li> <li><p>credit: have a credit card</p></li> <li><p>housing: mortgagors</p></li> <li><p>loan: have loans</p></li> <li><p>contact: contact information</p></li> <li><p>month: month</p></li> <li><p>day_of_week: day of the week</p></li> <li><p>duration: duration</p></li> <li><p>campaign: number of contacts for the current campaign</p></li> <li><p>pdays: time elapsed after the last contact</p></li> <li><p>previous: number of previous contacts</p></li> <li><p>poutcome: results of previous marketing activities</p></li> <li><p>emp_var_rate: employment variation rate</p></li> <li><p>cons_price_idx: consumer price index</p></li> <li><p>cons_conf_idx: consumer confidence index</p></li> <li><p>euribor 3m: 3-month Euro Interbank Offered Rate (Euribor)</p></li> <li><p>nr_employed: number of employees</p></li> <li><p>fixed_deposit: have a time deposit</p></li> </ul></li> <li><p>Fields of the partitioned table:</p> <ul> <li><p>age: age</p></li> <li><p>job: job type</p></li> <li><p>marital: marital status</p></li> <li><p>education: educational level</p></li> <li><p>housing: mortgagors</p></li> <li><p>loan: have loans</p></li> <li><p>contact: contact information</p></li> <li><p>month: month</p></li> <li><p>day_of_week: day of the week</p></li> <li><p>duration: duration</p></li> <li><p>campaign: number of contacts for the current campaign</p></li> <li><p>pdays: time elapsed after the last contact</p></li> <li><p>previous: number of previous contacts</p></li> <li><p>poutcome: results of previous marketing activities</p></li> <li><p>emp_var_rate: employment variation rate</p></li> <li><p>cons_price_idx: consumer price index</p></li> <li><p>cons_conf_idx: consumer confidence index</p></li> <li><p>euribor 3m: 3-month Euribor</p></li> <li><p>nr_employed: number of employees</p></li> <li><p>fixed_deposit: have a time deposit</p></li> <li><p>credit: have a credit card (partition key) </p></li> </ul></li> </ul></td> </tr> <tr> <td><p>Data types</p></td> <td><p>Identify the data type of each column in the sample files:</p> <ul> <li><p>age: BIGINT</p></li> <li><p>job: STRING</p></li> <li><p>marital: STRING</p></li> <li><p>education: STRING</p></li> <li><p>credit: STRING</p></li> <li><p>housing: STRING</p></li> <li><p>loan: STRING</p></li> <li><p>contact: STRING</p></li> <li><p>month: STRING</p></li> <li><p>day_of_week: STRING</p></li> <li><p>duration: STRING</p></li> <li><p>campaign: BIGINT</p></li> <li><p>pdays: DOUBLE</p></li> <li><p>previous: DOUBLE</p></li> <li><p>poutcome: STRING</p></li> <li><p>emp_var_rate: DOUBLE</p></li> <li><p>cons_price_idx: DOUBLE</p></li> <li><p>cons_conf_idx: DOUBLE</p></li> <li><p>euribor3m: DOUBLE</p></li> <li><p>nr_employed: DOUBLE</p></li> <li><p>fixed_deposit: BIGINT</p></li> </ul></td> </tr> </tbody> </table>
Step 2: Create the tables
This tutorial creates four tables in total:
-
bank_data: non-partitioned table for importing the full dataset -
bank_data_pt: partitioned table, withcreditas the partition key -
result_table1: non-partitioned table for storing query results -
result_table2: non-partitioned table for storing query results
For details about CREATE TABLE syntax, see Create a table. For details about table and partition management, see Table operations and Partition operations.
-
Optional: Switch to your target MaxCompute project. Skip this step if the project was already selected when you started the client.
use doc_test_dev;The prompt updates to confirm the switch:
doc_test_dev.default> -
Create the four tables. The column order and data types in each CREATE TABLE statement correspond directly to the fields identified in Step 1.
-
Create the non-partitioned table
bank_data:create table if not exists bank_data ( age BIGINT comment 'age', job STRING comment 'job type', marital STRING comment 'marital status', education STRING comment 'education level', credit STRING comment 'have a credit card', housing STRING comment 'mortgagors', loan STRING comment 'have loans', contact STRING comment 'contact information', month STRING comment 'month', day_of_week STRING comment 'day of the week', duration STRING comment 'duration', campaign BIGINT comment 'number of contacts for the current campaign', pdays DOUBLE comment 'time elapsed after the last contact', previous DOUBLE comment 'number of previous contacts', poutcome STRING comment 'results of previous marketing activities', emp_var_rate DOUBLE comment 'employment variance rate', cons_price_idx DOUBLE comment 'consumer price index', cons_conf_idx DOUBLE comment 'consumer confidence index', euribor3m DOUBLE comment '3-month Euribor', nr_employed DOUBLE comment 'number of employees', fixed_deposit BIGINT comment 'have a time deposit' ); -
Create the partitioned table
bank_data_ptand add partitions. Thecreditcolumn is used as the partition key instead of a regular data column, so the table has 20 data columns plus one partition key.create table if not exists bank_data_pt ( age BIGINT comment 'age', job STRING comment 'job type', marital STRING comment 'marital status', education STRING comment 'education level', housing STRING comment 'mortgagors', loan STRING comment 'have loans', contact STRING comment 'contact information', month STRING comment 'month', day_of_week STRING comment 'day of the week', duration STRING comment 'duration', campaign BIGINT comment 'number of contacts for the current campaign', pdays DOUBLE comment 'time elapsed after the last contact', previous DOUBLE comment 'number of previous contacts', poutcome STRING comment 'results of previous marketing activities', emp_var_rate DOUBLE comment 'employment variance rate', cons_price_idx DOUBLE comment 'consumer price index', cons_conf_idx DOUBLE comment 'consumer confidence index', euribor3m DOUBLE comment '3-month Euribor', nr_employed DOUBLE comment 'number of employees', fixed_deposit BIGINT comment 'have a time deposit' )partitioned by (credit STRING comment 'have a credit card'); alter table bank_data_pt add if not exists partition (credit='yes') partition (credit='no') partition (credit='unknown'); -
Create the non-partitioned table
result_table1:create table if not exists result_table1 ( education STRING comment 'education level', num BIGINT comment 'number of persons' ); -
Create the non-partitioned table
result_table2:create table if not exists result_table2 ( education STRING comment 'education level', num BIGINT comment 'number of persons', credit STRING comment 'have a credit card' );
-
Step 3: Verify the tables
Check that all tables were created and that the schemas are correct.
-
List all tables in the project:
show tables;All four tables appear in the output:
ALIYUN$****:bank_data ALIYUN$****:bank_data_pt ALIYUN$****:result_table1 ALIYUN$****:result_table2 -
Inspect each table's schema and partitions:
The
Size: 0field is expected—the table has no data yet. The size increases after you import data.-- View the schema of bank_data. desc bank_data; -- View the schema of bank_data_pt. desc bank_data_pt; -- View the partitions of bank_data_pt. show partitions bank_data_pt; -- View the schema of result_table1. desc result_table1; -- View the schema of result_table2. desc result_table2;The
desc bank_data_pt;output looks similar to:+------------------------------------------------------------------------------------+ | Owner: ALIYUN$**** | Project: doc_test_dev | | TableComment: | +------------------------------------------------------------------------------------+ | CreateTime: 2021-05-17 14:02:21 | | LastDDLTime: 2021-05-17 14:02:21 | | LastModifiedTime: 2021-05-17 14:02:21 | +------------------------------------------------------------------------------------+ | InternalTable: YES | Size: 0 | +------------------------------------------------------------------------------------+ | Native Columns: | +------------------------------------------------------------------------------------+ | Field | Type | Label | Comment | +------------------------------------------------------------------------------------+ | age | bigint | | age | | job | string | | job type | | marital | string | | marital status | | education | string | | educational level | | housing | string | | mortgagors | | loan | string | | have loans | | contact | string | | contact information | | month | string | | month | | day_of_week | string | | day of the week | | duration | string | | duration | | campaign | bigint | | number of contacts for the current campaign | | pdays | double | | time elapsed after the last contact | | previous | double | | number of previous contacts | | poutcome | string | | results of previous marketing activities | | emp_var_rate | double | | employment variation rate | | cons_price_idx | double | | consumer price index | | cons_conf_idx | double | | consumer confidence index | | euribor3m | double | | 3-month Euribor | | nr_employed | double | | number of employees | | fixed_deposit | bigint | | have a time deposit | +------------------------------------------------------------------------------------+ | Partition Columns: | +------------------------------------------------------------------------------------+ | credit | string | have a credit card | +------------------------------------------------------------------------------------+The
show partitions bank_data_pt;output confirms all three partitions were created:credit=no credit=unknown credit=yes
What's next
With all four tables created and the schemas verified, import data from the sample files. See Import data to tables.