Load OSS and Hologres Data into MaxCompute Tables - MaxCompute

Use the LOAD statement to load data from external data stores (e.g., OSS, Hologres, Amazon Redshift, or BigQuery) into a MaxCompute table or partition.

How it works

LOAD OVERWRITE clears the target table or partition before writing. LOAD INTO appends data to an existing table or partition without removing existing rows. Both commands read all files in the specified directory by default.

Data from Amazon Redshift and BigQuery must be staged in OSS before you can load it into MaxCompute.

MaxCompute uses dynamic partitioning to load data into partitioned tables when OSS subdirectories follow the key=value naming convention (for example, ds=20200909/).

Prerequisites

Before you begin, make sure you have:

A MaxCompute project with CreateTable and Alter permissions. For details, see MaxCompute permissions.
For OSS sources: MaxCompute authorized to read and list OSS objects. For enhanced security, use STS mode. See Authorize access in STS mode.
For Hologres sources: A RAM role with the required permissions added to the Hologres instance. See Create a Hologres external table (STS mode).

Limitations

The LOAD command does not support allowlist or denylist parameters.
The external storage and target MaxCompute project must be in the same region.
OSS:OSS consoleOSS consoleOSS console The target partitioned table schema (excluding partition columns) must match the external data schema. The external data must not include partition columns.
Hologres: Cannot load data from a Hologres partitioned table. Cannot load from a Hologres external table that uses dual-signature authorization. MaxCompute connects to Hologres only through a classic network domain name — VPC domain names are not supported.
Open-source formats: A single file cannot exceed 3 GB. Split large files before loading.

Load data from OSS

Syntax

CSV and TSV files:

{LOAD OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>', ...)];

Open-source formats (ORC, Parquet, TextFile, etc.):

{LOAD OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
[ROW FORMAT SERDE '<serde_class>'
  [WITH SERDEPROPERTIES ('<property_name>'='<property_value>', ...)]
]
STORED AS <file_format>;

Parameters

Parameter	Required	Description
`table_name`	Yes	Target table name. The table must exist before loading. Its schema (excluding partition columns) must match the external data.
`pt_spec`	No	Partition specification. Format: `(partition_col1 = partition_col_value1, ...)`. Omit the value when using dynamic partitions.
`oss_location`	Yes	OSS directory path. Format: `'oss://<oss_endpoint>/<object>'`. MaxCompute reads all files in this directory. For endpoint details, see Access OSS by using endpoints and bucket domain names.
`StorageHandler`	Yes (CSV/TSV)	Built-in storage handler class. Use `'com.aliyun.odps.CsvStorageHandler'` for CSV files.
`serde_class`	No	SerDe class for open-source formats. For example, use `'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'` for Parquet files.
`file_format`	Yes (open-source)	File format. Supported values: `ORC`, `PARQUET`, `RCFILE`, `SEQUENCEFILE`, `TEXTFILE`, `JSON`, `AVRO`.
`SERDEPROPERTIES`	Varies	Properties for the SerDe. Supports the same properties as MaxCompute external tables. See Create an OSS external table.

Common SERDEPROPERTIES for OSS:

Property	Required	Description
`odps.properties.rolearn`	Yes	ARN of the RAM role for OSS access. To view a role ARN, see View a RAM role.
`odps.text.option.delimiter`	No	Column delimiter for CSV files. Defaults to comma (`,`).
`field.delim`	No	Field delimiter for `LazySimpleSerDe`.

Load a CSV file from OSS

This example loads vehicle.csv from OSS into the MaxCompute table ambulance_data_csv_load. MaxCompute and OSS are owned by the same Alibaba Cloud account, and data transfers over the Alibaba Cloud internal network.

1. Authorize MaxCompute to access OSS.

Click here for one-click authorization.

2. Upload the source file to OSS.

Save vehicle.csv to the mc-test/data_location/ directory in an OSS bucket in the oss-cn-hangzhou region. For details on creating a bucket, see Create a bucket.

The resulting OSS directory path is:

oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/

3. Create the target table and load data.

Log on to the MaxCompute client (odpscmd) and run:

-- Create the target table.
CREATE TABLE ambulance_data_csv_load (
    vehicleId         INT,
    recordId          INT,
    patientId         INT,
    calls             INT,
    locationLatitude  DOUBLE,
    locationLongitude DOUBLE,
    recordTime        STRING,
    direction         STRING
);

-- Load the CSV file into the table.
LOAD OVERWRITE TABLE ambulance_data_csv_load
FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH SERDEPROPERTIES (
    -- ARN of AliyunODPSDefaultRole. Obtain it from the RAM role management page.
    'odps.properties.rolearn'='acs:ram::xxxxx:role/aliyunodpsdefaultrole',
    'odps.text.option.delimiter'=','
);

4. Verify the result.

-- Enable a full table scan (effective for the current session only).
SET odps.sql.allow.fullscan=true;
SELECT * FROM ambulance_data_csv_load;

-- Expected output:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+
| vehicleid  | recordid   | patientid  | calls      | locationlatitude | locationlongitude | recordtime     | direction  |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+
| 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
| 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
| 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
| 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          |
| 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
| 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
| 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          |
| 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | SW         |
| 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
| 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+

Load a TextFile from OSS into a partitioned table

This example loads vehicle.textfile from OSS into a specific partition of ambulance_data_textfile_load_pt.

1. Authorize MaxCompute to access OSS.

Click here for one-click authorization.

If MaxCompute and OSS are owned by different Alibaba Cloud accounts, see Authorization in STS mode before proceeding.

2. Upload the source file to OSS.

Save vehicle.textfile to the mc-test/data_location/ directory in an OSS bucket in the oss-cn-hangzhou region.

The resulting OSS directory path is:

oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/

3. Create the target table and load data.

-- Create the target partitioned table.
CREATE TABLE ambulance_data_textfile_load_pt (
    vehicleId         STRING,
    recordId          STRING,
    patientId         STRING,
    calls             STRING,
    locationLatitude  STRING,
    locationLongitude STRING,
    recordTime        STRING,
    direction         STRING
)
PARTITIONED BY (ds STRING);

-- Load data into the ds='20200910' partition.
LOAD OVERWRITE TABLE ambulance_data_textfile_load_pt PARTITION(ds='20200910')
FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('field.delim' = ',')
STORED AS TEXTFILE;

4. Verify the result.

-- Enable a full table scan (effective for the current session only).
SET odps.sql.allow.fullscan=true;
SELECT * FROM ambulance_data_textfile_load_pt;

-- Expected output:
+-----------+----------+-----------+-------+------------------+-------------------+----------------+-----------+----------+
| vehicleid | recordid | patientid | calls | locationlatitude | locationlongitude | recordtime     | direction | ds       |
+-----------+----------+-----------+-------+------------------+-------------------+----------------+-----------+----------+
| 1         | 1        | 51        | 1     | 46.81006         | -92.08174         | 9/14/2014 0:00 | S         | 20200910 |
| 1         | 2        | 13        | 1     | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE        | 20200910 |
| 1         | 3        | 48        | 1     | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE        | 20200910 |
| 1         | 4        | 30        | 1     | 46.81006         | -92.08174         | 9/14/2014 0:00 | W         | 20200910 |
| 1         | 5        | 47        | 1     | 46.81006         | -92.08174         | 9/14/2014 0:00 | S         | 20200910 |
| 1         | 6        | 9         | 1     | 46.81006         | -92.08174         | 9/14/2014 0:00 | S         | 20200910 |
| 1         | 7        | 53        | 1     | 46.81006         | -92.08174         | 9/14/2014 0:00 | N         | 20200910 |
| 1         | 8        | 63        | 1     | 46.81006         | -92.08174         | 9/14/2014 0:00 | SW        | 20200910 |
| 1         | 9        | 4         | 1     | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE        | 20200910 |
| 1         | 10       | 31        | 1     | 46.81006         | -92.08174         | 9/14/2014 0:00 | N         | 20200910 |
+-----------+----------+-----------+-------+------------------+-------------------+----------------+-----------+----------+

Load data with dynamic partitions

Use dynamic partitions when OSS subdirectories are named with partition key-value pairs (for example, ds=20200909/). MaxCompute infers partition values from the directory names, so you specify only the partition key in the PARTITION clause.

1. Authorize MaxCompute to access OSS.

Click here for one-click authorization.

2. Upload the source files to OSS.

Save vehicle1.csv to mc-test/data_location/ds=20200909/ and vehicle2.csv to mc-test/data_location/ds=20200910/ in an OSS bucket in the oss-cn-hangzhou region.

The resulting OSS directory paths are:

oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200909/
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200910/

3. Create the target table and load data.

-- Create the target partitioned table.
CREATE TABLE ambulance_data_csv_load_dynpt (
    vehicleId         STRING,
    recordId          STRING,
    patientId         STRING,
    calls             STRING,
    locationLatitude  STRING,
    locationLongitude STRING,
    recordTime        STRING,
    direction         STRING
)
PARTITIONED BY (ds STRING);

-- Load data using dynamic partitions.
-- MaxCompute reads the partition value from each subdirectory name.
LOAD OVERWRITE TABLE ambulance_data_csv_load_dynpt PARTITION(ds)
FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE;

4. Verify the result.

-- Enable a full table scan (effective for the current session only).
SET odps.sql.allow.fullscan=true;
SELECT * FROM ambulance_data_csv_load_dynpt;

-- Expected output:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
| vehicleid  | recordid   | patientid  | calls      | locationlatitude | locationlongitude | recordtime     | direction  | ds         |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
| 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          | 20200909   |
| 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | SW         | 20200909   |
| 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200909   |
| 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          | 20200909   |
| 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
| 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200910   |
| 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200910   |
| 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          | 20200910   |
| 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
| 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+

Load data from Hologres

Syntax

{LOAD OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <hologres_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>', ...)];

Parameters

Parameter	Required	Description
`table_name`	Yes	Target MaxCompute internal table name. The table must exist before loading.
`pt_spec`	No	Partition specification. Format: `(partition_col1 = partition_col_value1, ...)`.
`hologres_location`	Yes	JDBC connection URL for the Hologres source table. Format: `'jdbc:postgresql://<endpoint>:<port>/<database>?ApplicationName=MaxCompute&[currentSchema=<schema>&]useSSL={true\|false}&table=<holo_table_name>/'`.
`StorageHandler`	Yes	Fixed value: `'com.aliyun.odps.jdbc.JdbcStorageHandler'`.

Hologres location URL sub-parameters:

Sub-parameter	Required	Description
`endpoint`	Yes	Classic network domain name of the Hologres instance. To get it, see Instance Details. VPC domain names are not supported.
`port`	Yes	Network port of the Hologres instance.
`database`	Yes	Hologres database name. See CREATE DATABASE.
`ApplicationName`	Yes	Fixed value: `MaxCompute`.
`schema`	No	Hologres schema name. Omit if the table name is unique in the database or if the source table is in the default schema. See CREATE SCHEMA.
`holo_table_name`	Yes	Hologres source table name. See CREATE TABLE.

Required SERDEPROPERTIES for Hologres:

Property	Required	Description
`odps.properties.rolearn`	Yes	ARN of the RAM role for Hologres access.
`mcfed.mapreduce.jdbc.driver.class`	Yes	Fixed value: `'org.postgresql.Driver'`.
`odps.federation.jdbc.target.db.type`	Yes	Fixed value: `'holo'`.
`odps.federation.jdbc.colmapping`	No	Column mapping between MaxCompute and Hologres columns. Default: map by name. Format: `MaxCompute_column1:"Hologres_column1"[,MaxCompute_column2:"Hologres_column2",...]`. Enclose Hologres column names in double quotes if they contain uppercase letters.
`mcfed.mapreduce.jdbc.input.query`	No	Custom SQL query to read from Hologres. The result set schema (column names and types) must match the MaxCompute external table. Format: `SELECT xxx FROM <holo_database_name>.<holo_schema_name>.<holo_table_name>`.

Column mapping behavior:

If odps.federation.jdbc.colmapping is not set, columns are mapped by name.
If it is set for only a subset of columns, the remaining columns are mapped by name. An error occurs if any unmapped columns have mismatched names or types.

Load data from a Hologres table

This example loads data from the Hologres table holo_ext into the MaxCompute internal table from_holo.

Source data in Hologres:

-- Query the Hologres external table:
SELECT * FROM holo_ext;

-- Result:
+------------+------+
| id         | name |
+------------+------+
| 1          | abc  |
| 2          | ereg |
+------------+------+

1. Create the MaxCompute target table.

CREATE TABLE from_holo (id BIGINT, name STRING);

2. Load data from Hologres.

LOAD INTO TABLE from_holo
FROM LOCATION 'jdbc:postgresql://hgprecn-cn-wwo3ft0l****-cn-beijing-internal.hologres.aliyuncs.com:80/<YOUR_HOLO_DB_NAME>?application_name=MaxCompute&currentSchema=public&useSSL=false&table=<YOUR_HOLOGRES_TABLE_NAME>/'
STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler'
WITH SERDEPROPERTIES (
    'odps.properties.rolearn'='acs:ram::18927322887*****:role/hologresrole',
    'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver',
    'odps.federation.jdbc.target.db.type'='holo'
);

3. Verify the result.

SELECT * FROM from_holo;

-- Expected output:
+------------+------+
| id         | name |
+------------+------+
| 2          | ereg |
| 1          | abc  |
+------------+------+

Billing

All LOAD tasks consume compute resources from their resource group. Under the pay-as-you-go model, you are not charged for the volume of imported data. Under the subscription model, individual LOAD tasks are not charged separately.

What's next

To export data from MaxCompute to OSS or Hologres, see UNLOAD.

Import data to MaxCompute

Data from Amazon Redshift and BigQuery must be staged in OSS before you can load it into MaxCompute.

Import OSS data

When you load data into a MaxCompute partitioned table, its schema (excluding partition columns) must match the schema of the external data. The schema of the external data must not include the partition columns.

Text formats (CSV and TSV)

Use the built-in StorageHandler to load data in CSV or TSV format.

Syntax

LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];

Parameters

Parameter	Description
`oss_location`	The path to the source object in OSS. The format is `oss://<oss_endpoint>/<bucket>/<object_path>`.
`StorageHandler`	Specifies the built-in storage handler: `'com.aliyun.odps.CsvStorageHandler'`: The built-in handler for files in CSV format.
`SERDEPROPERTIES`	Specifies properties for the SerDe (Serializer/Deserializer): `'odps.properties.rolearn'`: Required. The RAM role ARN for accessing OSS. `'odps.text.option.delimiter'`: Optional. Specifies the column delimiter. The default is a comma (`,`).

Examples

Example 1: Load data from a CSV file into MaxCompute
Example 2: Load data into a partitioned table using dynamic partitions

Example 1: Load data from a CSV file into MaxCompute

This example shows how to load data from the vehicle.csv file stored in OSS into a MaxCompute table. The vehicle.csv file contains the following data:

1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N

Procedure:

In OSS: Log on to the OSS console. Save the vehicle.csv file to the mc-load/data_location/ directory of a bucket in the oss-cn-hangzhou region. For details on creating a bucket, see Create a bucket. Construct the OSS path based on the bucket, region, and endpoint as follows:
```
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/
```

In MaxCompute: Log on to the MaxCompute client, create the target table, and run the LOAD statement.

-- Create the target table.
CREATE TABLE IF NOT EXISTS ambulance_data_csv_load (
    vehicleId INT,
    recordId INT,
    patientId INT,
    calls INT,
    locationLatitute DOUBLE,
    locationLongtitue DOUBLE,
    recordTime STRING,
    direction STRING
);

-- Run the LOAD statement.
LOAD OVERWRITE TABLE ambulance_data_csv_load
FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/'
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH SERDEPROPERTIES (
    'odps.properties.rolearn'='acs:ram::123456789:role/AliyunODPSDefaultRole',
    'odps.text.option.delimiter'=','
);

Verify the result: Query the target table.

-- Enable a full table scan (effective for the current session only).
SET odps.sql.allow.fullscan=true;

SELECT * FROM ambulance_data_csv_load;

Result:

+------------+------------+------------+------------+------------------+-------------------+------------+------------+
| vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime | direction  |
+------------+------------+------------+------------+------------------+-------------------+------------+------------+
| 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          |
| 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         |
+------------+------------+------------+------------+------------------+-------------------+------------+------------+

Example 2: Load data into a partitioned table using dynamic partitions

This example shows how to load data from the vehicle1.csv and vehicle2.csv files stored in OSS into the corresponding partitions in a MaxCompute table. The vehicle1.csv and vehicle2.csv files contain the following data:

--vehicle1.csv
1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S

--vehicle2.csv
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N

Procedure:

In OSS: Log on to the OSS console. Save the vehicle2.csv file to the mc-load/data_location/ds=20200909/ directory and the vehicle1.csv file to the mc-load/data_location/ds=20200910/ directory of a bucket in the oss-cn-hangzhou region. For more information about how to create a bucket, see Create a bucket. Construct the OSS paths based on the bucket, region, and endpoint as follows:
```
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/ds=20200909/'
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/ds=20200910/'
```

In MaxCompute: Log on to the MaxCompute client, create the target table, and run the LOAD statement.

-- Create the target table.
create table ambulance_data_csv_load_dynpt (
vehicleId STRING,
recordId STRING,
patientId STRING,
calls STRING,
locationLatitute STRING,
locationLongtitue STRING,
recordTime STRING,
direction STRING)
partitioned by (ds STRING);

-- Run the LOAD statement.
load overwrite table ambulance_data_csv_load_dynpt partition(ds)
from
location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/'
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
stored as textfile;

Verify the result: View the imported data in the target table.

set odps.sql.allow.fullscan=true;
select * from ambulance_data_csv_load_dynpt;

Result:

+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
| vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime | direction  | ds         |
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
| 1          | 7          | 53         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          | 20200909   |
| 1          | 8          | 63         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | SW         | 20200909   |
| 1          | 9          | 4          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200909   |
| 1          | 10         | 31         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | N          | 20200909   |
| 1          | 1          | 51         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
| 1          | 2          | 13         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200910   |
| 1          | 3          | 48         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | NE         | 20200910   |
| 1          | 4          | 30         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | W          | 20200910   |
| 1          | 5          | 47         | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
| 1          | 6          | 9          | 1          | 46.81006         | -92.08174         | 9/14/2014 0:00 | S          | 20200910   |
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+

Open-source formats

You can load data from various open-source formats, including Parquet, ORC, JSON, Avro, RCFile, SequenceFile, and TextFile.

Syntax

LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
ROW FORMAT SERDE '<serde_class>'
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)]
STORED AS <file_format>;

Key parameters

Parameter	Description
`oss_location`	The path to the source object in OSS. The format is `oss://<oss_endpoint>/<bucket>/<object_path>`.
`file_format`	The file format. Supported formats include `PARQUET`, `ORC`, `JSON`, and `TEXTFILE`.
`serde_class`	The SerDe class. For example, for Parquet files, use `org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe`.

Usage example

This example shows how to load data from the vehicle.textfile file stored in OSS into MaxCompute. The vehicle.textfile file contains the following data:

1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N

Procedure:

In OSS: Log on to the OSS console. Save the vehicle.textfile file to the mc-load/data_location/ directory of a bucket in the oss-cn-hangzhou region. For more information about how to create a bucket, see Create a bucket. Construct the OSS path based on the bucket, region, and endpoint as follows:
```
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/
```

In MaxCompute: Log on to the MaxCompute client, create the target table, and run the LOAD statement.

-- Create the target table.
CREATE TABLE IF NOT EXISTS ambulance_data_textfile_load_pt (
    vehicleId STRING,
    recordId STRING,
    patientId STRING,
    calls STRING,
    locationLatitute STRING,
    locationLongtitue STRING,
    recordTime STRING,
    direction STRING
)
PARTITIONED BY (ds STRING);

-- Run the LOAD statement.
LOAD OVERWRITE TABLE ambulance_data_textfile_load_pt PARTITION(ds='20200910')
FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS TEXTFILE;

Verify the result: Query the target table to verify the imported data.

SET odps.sql.allow.fullscan=true;
SELECT * FROM ambulance_data_textfile_load_pt;

Result:

+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
| vehicleid  | recordid   | patientid  | calls      | locationlatitute | locationlongtitue | recordtime | direction  | ds         |
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
| 1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
| 1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
| 1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
| 1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
| 1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
| 1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
| 1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
| 1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
| 1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
| 1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL       | NULL       | NULL       | NULL             | NULL              | NULL       | NULL       | 20200910   |
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+

Import Hologres data

You can load data directly from a Hologres external table into a MaxCompute internal table.

Dual-signature authorization: Not supported.
Partitioned tables: You cannot load data from Hologres partitioned tables.

Syntax

LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <hologres_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];

Key parameters

Parameter	Description
`hologres_location`	The JDBC connection string for the source Hologres table. The format is: `jdbc:postgresql://<endpoint>:<port>/<db>?table=<holo_table_name>`
`StorageHandler`	Specifies the built-in storage handler. This parameter must be set to `com.aliyun.odps.jdbc.JdbcStorageHandler`.
`SERDEPROPERTIES`	Must include the following properties: `'odps.properties.rolearn'`: The RAM role ARN to use for accessing OSS. `'mcfed.mapreduce.jdbc.driver.class'`: Must be set to `org.postgresql.Driver`. `'odps.federation.jdbc.target.db.type'` (must be `holo`)

Usage example

This example shows how to load data from the holo_2_mc table in Hologres into MaxCompute. The source table contains the following data:

id | name
---|-----
1  | abc
2  | ereg

Procedure:

In Hologres: Create the source table holo_2_mc and load data into it.

In MaxCompute: Log on to the MaxCompute client, create the target table, and run the LOAD statement.

-- Create the target table.
CREATE TABLE IF NOT EXISTS mf_from_holo (
    id BIGINT,
    name STRING
);

-- Run the LOAD statement.
LOAD INTO TABLE mf_from_holo 
FROM LOCATION 'jdbc:postgresql://hgprecn-cn-xxx-cn-beijing-internal.hologres.aliyuncs.com:80/mf_db?application_name=MaxCompute&currentSchema=public&useSSL=false&table=mf_holo/' 
STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' 
WITH SERDEPROPERTIES (
  'odps.properties.rolearn'='acs:ram::189273xxx:role/hologressrole',
  'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 
  'odps.federation.jdbc.target.db.type'='holo'
);

Verify the result: Query the target MaxCompute table.

SELECT * FROM mf_from_holo;

Result:

+------------+------+
| id         | name |
+------------+------+
| 1          | abc  |
| 2          | ereg |
+------------+------+