Use the LOAD statement to load data from external data stores (e.g., OSS, Hologres, Amazon Redshift, or BigQuery) into a MaxCompute table or partition.
How it works
LOAD OVERWRITE clears the target table or partition before writing. LOAD INTO appends data to an existing table or partition without removing existing rows. Both commands read all files in the specified directory by default.
Data from Amazon Redshift and BigQuery must be staged in OSS before you can load it into MaxCompute.
MaxCompute uses dynamic partitioning to load data into partitioned tables when OSS subdirectories follow the key=value naming convention (for example, ds=20200909/).
Prerequisites
Before you begin, make sure you have:
A MaxCompute project with CreateTable and Alter permissions. For details, see MaxCompute permissions.
For OSS sources: MaxCompute authorized to read and list OSS objects. For enhanced security, use STS mode. See Authorize access in STS mode.
For Hologres sources: A RAM role with the required permissions added to the Hologres instance. See Create a Hologres external table (STS mode).
Limitations
The LOAD command does not support allowlist or denylist parameters.
The external storage and target MaxCompute project must be in the same region.
OSS:OSS consoleOSS consoleOSS console The target partitioned table schema (excluding partition columns) must match the external data schema. The external data must not include partition columns.
Hologres: Cannot load data from a Hologres partitioned table. Cannot load from a Hologres external table that uses dual-signature authorization. MaxCompute connects to Hologres only through a classic network domain name — VPC domain names are not supported.
Open-source formats: A single file cannot exceed 3 GB. Split large files before loading.
Load data from OSS
Syntax
CSV and TSV files:
{LOAD OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>', ...)];Open-source formats (ORC, Parquet, TextFile, etc.):
{LOAD OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
[ROW FORMAT SERDE '<serde_class>'
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>', ...)]
]
STORED AS <file_format>;Parameters
| Parameter | Required | Description |
|---|---|---|
table_name | Yes | Target table name. The table must exist before loading. Its schema (excluding partition columns) must match the external data. |
pt_spec | No | Partition specification. Format: (partition_col1 = partition_col_value1, ...). Omit the value when using dynamic partitions. |
oss_location | Yes | OSS directory path. Format: 'oss://<oss_endpoint>/<object>'. MaxCompute reads all files in this directory. For endpoint details, see Access OSS by using endpoints and bucket domain names. |
StorageHandler | Yes (CSV/TSV) | Built-in storage handler class. Use 'com.aliyun.odps.CsvStorageHandler' for CSV files. |
serde_class | No | SerDe class for open-source formats. For example, use 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' for Parquet files. |
file_format | Yes (open-source) | File format. Supported values: ORC, PARQUET, RCFILE, SEQUENCEFILE, TEXTFILE, JSON, AVRO. |
SERDEPROPERTIES | Varies | Properties for the SerDe. Supports the same properties as MaxCompute external tables. See Create an OSS external table. |
Common SERDEPROPERTIES for OSS:
| Property | Required | Description |
|---|---|---|
odps.properties.rolearn | Yes | ARN of the RAM role for OSS access. To view a role ARN, see View a RAM role. |
odps.text.option.delimiter | No | Column delimiter for CSV files. Defaults to comma (,). |
field.delim | No | Field delimiter for LazySimpleSerDe. |
Load a CSV file from OSS
This example loads vehicle.csv from OSS into the MaxCompute table ambulance_data_csv_load. MaxCompute and OSS are owned by the same Alibaba Cloud account, and data transfers over the Alibaba Cloud internal network.
1. Authorize MaxCompute to access OSS.
Click here for one-click authorization.
2. Upload the source file to OSS.
Save vehicle.csv to the mc-test/data_location/ directory in an OSS bucket in the oss-cn-hangzhou region. For details on creating a bucket, see Create a bucket.
The resulting OSS directory path is:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/3. Create the target table and load data.
Log on to the MaxCompute client (odpscmd) and run:
-- Create the target table.
CREATE TABLE ambulance_data_csv_load (
vehicleId INT,
recordId INT,
patientId INT,
calls INT,
locationLatitude DOUBLE,
locationLongitude DOUBLE,
recordTime STRING,
direction STRING
);
-- Load the CSV file into the table.
LOAD OVERWRITE TABLE ambulance_data_csv_load
FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH SERDEPROPERTIES (
-- ARN of AliyunODPSDefaultRole. Obtain it from the RAM role management page.
'odps.properties.rolearn'='acs:ram::xxxxx:role/aliyunodpsdefaultrole',
'odps.text.option.delimiter'=','
);4. Verify the result.
-- Enable a full table scan (effective for the current session only).
SET odps.sql.allow.fullscan=true;
SELECT * FROM ambulance_data_csv_load;
-- Expected output:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+
| vehicleid | recordid | patientid | calls | locationlatitude | locationlongitude | recordtime | direction |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+
| 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S |
| 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE |
| 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE |
| 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W |
| 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S |
| 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S |
| 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N |
| 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | SW |
| 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE |
| 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+Load a TextFile from OSS into a partitioned table
This example loads vehicle.textfile from OSS into a specific partition of ambulance_data_textfile_load_pt.
1. Authorize MaxCompute to access OSS.
Click here for one-click authorization.
If MaxCompute and OSS are owned by different Alibaba Cloud accounts, see Authorization in STS mode before proceeding.
2. Upload the source file to OSS.
Save vehicle.textfile to the mc-test/data_location/ directory in an OSS bucket in the oss-cn-hangzhou region.
The resulting OSS directory path is:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/3. Create the target table and load data.
-- Create the target partitioned table.
CREATE TABLE ambulance_data_textfile_load_pt (
vehicleId STRING,
recordId STRING,
patientId STRING,
calls STRING,
locationLatitude STRING,
locationLongitude STRING,
recordTime STRING,
direction STRING
)
PARTITIONED BY (ds STRING);
-- Load data into the ds='20200910' partition.
LOAD OVERWRITE TABLE ambulance_data_textfile_load_pt PARTITION(ds='20200910')
FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('field.delim' = ',')
STORED AS TEXTFILE;4. Verify the result.
-- Enable a full table scan (effective for the current session only).
SET odps.sql.allow.fullscan=true;
SELECT * FROM ambulance_data_textfile_load_pt;
-- Expected output:
+-----------+----------+-----------+-------+------------------+-------------------+----------------+-----------+----------+
| vehicleid | recordid | patientid | calls | locationlatitude | locationlongitude | recordtime | direction | ds |
+-----------+----------+-----------+-------+------------------+-------------------+----------------+-----------+----------+
| 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 |
| 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 |
| 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 |
| 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | 20200910 |
| 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 |
| 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 |
| 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200910 |
| 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | SW | 20200910 |
| 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 |
| 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200910 |
+-----------+----------+-----------+-------+------------------+-------------------+----------------+-----------+----------+Load data with dynamic partitions
Use dynamic partitions when OSS subdirectories are named with partition key-value pairs (for example, ds=20200909/). MaxCompute infers partition values from the directory names, so you specify only the partition key in the PARTITION clause.
1. Authorize MaxCompute to access OSS.
Click here for one-click authorization.
2. Upload the source files to OSS.
Save vehicle1.csv to mc-test/data_location/ds=20200909/ and vehicle2.csv to mc-test/data_location/ds=20200910/ in an OSS bucket in the oss-cn-hangzhou region.
The resulting OSS directory paths are:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200909/
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200910/3. Create the target table and load data.
-- Create the target partitioned table.
CREATE TABLE ambulance_data_csv_load_dynpt (
vehicleId STRING,
recordId STRING,
patientId STRING,
calls STRING,
locationLatitude STRING,
locationLongitude STRING,
recordTime STRING,
direction STRING
)
PARTITIONED BY (ds STRING);
-- Load data using dynamic partitions.
-- MaxCompute reads the partition value from each subdirectory name.
LOAD OVERWRITE TABLE ambulance_data_csv_load_dynpt PARTITION(ds)
FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE;4. Verify the result.
-- Enable a full table scan (effective for the current session only).
SET odps.sql.allow.fullscan=true;
SELECT * FROM ambulance_data_csv_load_dynpt;
-- Expected output:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
| vehicleid | recordid | patientid | calls | locationlatitude | locationlongitude | recordtime | direction | ds |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
| 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 |
| 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | SW | 20200909 |
| 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200909 |
| 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 |
| 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 |
| 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 |
| 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 |
| 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | 20200910 |
| 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 |
| 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+Load data from Hologres
Syntax
{LOAD OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <hologres_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>', ...)];Parameters
| Parameter | Required | Description |
|---|---|---|
table_name | Yes | Target MaxCompute internal table name. The table must exist before loading. |
pt_spec | No | Partition specification. Format: (partition_col1 = partition_col_value1, ...). |
hologres_location | Yes | JDBC connection URL for the Hologres source table. Format: 'jdbc:postgresql://<endpoint>:<port>/<database>?ApplicationName=MaxCompute&[currentSchema=<schema>&]useSSL={true|false}&table=<holo_table_name>/'. |
StorageHandler | Yes | Fixed value: 'com.aliyun.odps.jdbc.JdbcStorageHandler'. |
Hologres location URL sub-parameters:
| Sub-parameter | Required | Description |
|---|---|---|
endpoint | Yes | Classic network domain name of the Hologres instance. To get it, see Instance Details. VPC domain names are not supported. |
port | Yes | Network port of the Hologres instance. |
database | Yes | Hologres database name. See CREATE DATABASE. |
ApplicationName | Yes | Fixed value: MaxCompute. |
schema | No | Hologres schema name. Omit if the table name is unique in the database or if the source table is in the default schema. See CREATE SCHEMA. |
holo_table_name | Yes | Hologres source table name. See CREATE TABLE. |
Required SERDEPROPERTIES for Hologres:
| Property | Required | Description |
|---|---|---|
odps.properties.rolearn | Yes | ARN of the RAM role for Hologres access. |
mcfed.mapreduce.jdbc.driver.class | Yes | Fixed value: 'org.postgresql.Driver'. |
odps.federation.jdbc.target.db.type | Yes | Fixed value: 'holo'. |
odps.federation.jdbc.colmapping | No | Column mapping between MaxCompute and Hologres columns. Default: map by name. Format: MaxCompute_column1:"Hologres_column1"[,MaxCompute_column2:"Hologres_column2",...]. Enclose Hologres column names in double quotes if they contain uppercase letters. |
mcfed.mapreduce.jdbc.input.query | No | Custom SQL query to read from Hologres. The result set schema (column names and types) must match the MaxCompute external table. Format: SELECT xxx FROM <holo_database_name>.<holo_schema_name>.<holo_table_name>. |
Column mapping behavior:
If
odps.federation.jdbc.colmappingis not set, columns are mapped by name.If it is set for only a subset of columns, the remaining columns are mapped by name. An error occurs if any unmapped columns have mismatched names or types.
Load data from a Hologres table
This example loads data from the Hologres table holo_ext into the MaxCompute internal table from_holo.
Source data in Hologres:
-- Query the Hologres external table:
SELECT * FROM holo_ext;
-- Result:
+------------+------+
| id | name |
+------------+------+
| 1 | abc |
| 2 | ereg |
+------------+------+1. Create the MaxCompute target table.
CREATE TABLE from_holo (id BIGINT, name STRING);2. Load data from Hologres.
LOAD INTO TABLE from_holo
FROM LOCATION 'jdbc:postgresql://hgprecn-cn-wwo3ft0l****-cn-beijing-internal.hologres.aliyuncs.com:80/<YOUR_HOLO_DB_NAME>?application_name=MaxCompute¤tSchema=public&useSSL=false&table=<YOUR_HOLOGRES_TABLE_NAME>/'
STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler'
WITH SERDEPROPERTIES (
'odps.properties.rolearn'='acs:ram::18927322887*****:role/hologresrole',
'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver',
'odps.federation.jdbc.target.db.type'='holo'
);3. Verify the result.
SELECT * FROM from_holo;
-- Expected output:
+------------+------+
| id | name |
+------------+------+
| 2 | ereg |
| 1 | abc |
+------------+------+Billing
All LOAD tasks consume compute resources from their resource group. Under the pay-as-you-go model, you are not charged for the volume of imported data. Under the subscription model, individual LOAD tasks are not charged separately.
What's next
To export data from MaxCompute to OSS or Hologres, see UNLOAD.
Import data to MaxCompute
Data from Amazon Redshift and BigQuery must be staged in OSS before you can load it into MaxCompute.
Import OSS data
When you load data into a MaxCompute partitioned table, its schema (excluding partition columns) must match the schema of the external data. The schema of the external data must not include the partition columns.
Text formats (CSV and TSV)
Use the built-in StorageHandler to load data in CSV or TSV format.
Syntax
LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];Parameters
Parameter | Description |
| The path to the source object in OSS. The format is |
| Specifies the built-in storage handler:
|
| Specifies properties for the SerDe (Serializer/Deserializer):
|
Examples
Example 1: Load data from a CSV file into MaxCompute
This example shows how to load data from the vehicle.csv file stored in OSS into a MaxCompute table. The vehicle.csv file contains the following data:
1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,NProcedure:
In OSS: Log on to the OSS console. Save the
vehicle.csvfile to themc-load/data_location/directory of a bucket in theoss-cn-hangzhouregion. For details on creating a bucket, see Create a bucket. Construct the OSS path based on the bucket, region, and endpoint as follows:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/In MaxCompute: Log on to the MaxCompute client, create the target table, and run the
LOADstatement.-- Create the target table. CREATE TABLE IF NOT EXISTS ambulance_data_csv_load ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING ); -- Run the LOAD statement. LOAD OVERWRITE TABLE ambulance_data_csv_load FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/' STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::123456789:role/AliyunODPSDefaultRole', 'odps.text.option.delimiter'=',' );Verify the result: Query the target table.
-- Enable a full table scan (effective for the current session only). SET odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_csv_load;Result:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | +------------+------------+------------+------------+------------------+-------------------+------------+------------+
Example 2: Load data into a partitioned table using dynamic partitions
This example shows how to load data from the vehicle1.csv and vehicle2.csv files stored in OSS into the corresponding partitions in a MaxCompute table. The vehicle1.csv and vehicle2.csv files contain the following data:
--vehicle1.csv
1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
--vehicle2.csv
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,NProcedure:
In OSS: Log on to the OSS console. Save the
vehicle2.csvfile to themc-load/data_location/ds=20200909/directory and thevehicle1.csvfile to themc-load/data_location/ds=20200910/directory of a bucket in theoss-cn-hangzhouregion. For more information about how to create a bucket, see Create a bucket. Construct the OSS paths based on the bucket, region, and endpoint as follows:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/ds=20200909/' oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/ds=20200910/'In MaxCompute: Log on to the MaxCompute client, create the target table, and run the
LOADstatement.-- Create the target table. create table ambulance_data_csv_load_dynpt ( vehicleId STRING, recordId STRING, patientId STRING, calls STRING, locationLatitute STRING, locationLongtitue STRING, recordTime STRING, direction STRING) partitioned by (ds STRING); -- Run the LOAD statement. load overwrite table ambulance_data_csv_load_dynpt partition(ds) from location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/' row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' stored as textfile;Verify the result: View the imported data in the target table.
set odps.sql.allow.fullscan=true; select * from ambulance_data_csv_load_dynpt;Result:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | ds | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | SW | 20200909 | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200909 | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 | | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | 20200910 | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
Open-source formats
You can load data from various open-source formats, including Parquet, ORC, JSON, Avro, RCFile, SequenceFile, and TextFile.
Syntax
LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
ROW FORMAT SERDE '<serde_class>'
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)]
STORED AS <file_format>;Key parameters
Parameter | Description |
| The path to the source object in OSS. The format is |
| The file format. Supported formats include |
| The SerDe class. For example, for Parquet files, use |
Usage example
This example shows how to load data from the vehicle.textfile file stored in OSS into MaxCompute. The vehicle.textfile file contains the following data:
1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,NProcedure:
In OSS: Log on to the OSS console. Save the
vehicle.textfilefile to themc-load/data_location/directory of a bucket in theoss-cn-hangzhouregion. For more information about how to create a bucket, see Create a bucket. Construct the OSS path based on the bucket, region, and endpoint as follows:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/In MaxCompute: Log on to the MaxCompute client, create the target table, and run the
LOADstatement.-- Create the target table. CREATE TABLE IF NOT EXISTS ambulance_data_textfile_load_pt ( vehicleId STRING, recordId STRING, patientId STRING, calls STRING, locationLatitute STRING, locationLongtitue STRING, recordTime STRING, direction STRING ) PARTITIONED BY (ds STRING); -- Run the LOAD statement. LOAD OVERWRITE TABLE ambulance_data_textfile_load_pt PARTITION(ds='20200910') FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS TEXTFILE;Verify the result: Query the target table to verify the imported data.
SET odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_textfile_load_pt;Result:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | ds | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | 1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
Import Hologres data
You can load data directly from a Hologres external table into a MaxCompute internal table.
Dual-signature authorization: Not supported.
Partitioned tables: You cannot load data from Hologres partitioned tables.
Syntax
LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <hologres_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];Key parameters
Parameter | Description |
| The JDBC connection string for the source Hologres table. The format is: |
| Specifies the built-in storage handler. This parameter must be set to |
| Must include the following properties:
|
Usage example
This example shows how to load data from the holo_2_mc table in Hologres into MaxCompute. The source table contains the following data:
id | name
---|-----
1 | abc
2 | eregProcedure:
In Hologres: Create the source table
holo_2_mcand load data into it.In MaxCompute: Log on to the MaxCompute client, create the target table, and run the
LOADstatement.-- Create the target table. CREATE TABLE IF NOT EXISTS mf_from_holo ( id BIGINT, name STRING ); -- Run the LOAD statement. LOAD INTO TABLE mf_from_holo FROM LOCATION 'jdbc:postgresql://hgprecn-cn-xxx-cn-beijing-internal.hologres.aliyuncs.com:80/mf_db?application_name=MaxCompute¤tSchema=public&useSSL=false&table=mf_holo/' STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::189273xxx:role/hologressrole', 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo' );Verify the result: Query the target MaxCompute table.
SELECT * FROM mf_from_holo;Result:
+------------+------+ | id | name | +------------+------+ | 1 | abc | | 2 | ereg | +------------+------+