LOAD 命令用於將外部儲存(OSS、Hologres、Amazon Redshift、BigQuery)中的資料匯入到 MaxCompute 表或分區中。
功能介紹
MaxCompute支援使用load overwrite或load into命令將即時數倉Hologres、Object Storage Service、Amazon Redshift、BigQuery外部儲存的CSV格式或其他開源格式資料匯入MaxCompute的表或表的分區。其中:
Amazon Redshift和BigQuery的資料需要先匯入OSS,才可以通過OSS匯入MaxCompute。
MaxCompute支援將資料按照動態分區方式匯入MaxCompute的分區表的分區。
load into命令會直接向表或分區中追加資料。load overwrite命令會先清空表或分區中的原有資料,再向表或分區中插入資料。
使用限制
許可權要求
MaxCompute 許可權
CreateTable 和 Alter 許可權:用於在 MaxCompute 專案空間中建立表或修改表資料。授權操作請參見MaxCompute許可權。
外部儲存許可權
OSS 許可權:需要授權 MaxCompute 訪問 OSS 的許可權(讀取/列舉 Object)。推薦使用 STS 模式授權,具備更高安全性。詳情請參見STS模式授權。
Hologres 許可權:需要建立一個 RAM 角色,為其授權允許MaxCompute訪問的許可權,並將角色添加至Hologres執行個體,完成授權,操作詳情請參見建立Hologres外部表格(STS模式)。
其他限制
LOAD命令不支援使用黑白名單參數。
當前只支援將外部儲存的資料匯入至同地區的MaxCompute專案空間中。
通過OSS匯入資料時:
匯入到目標MaxCompute分區表時,目標表的Schema(除分區列)需要和外部資料格式一致,且外部資料的Schema不包含分區列。
通過Hologres匯入資料時:
不支援將Hologres分區表資料匯入MaxCompute。
不支援使用雙簽名授權模式的Hologres外表匯入資料至MaxCompute。
匯入資料
匯入外部儲存OSS或Hologres資料
匯入外部儲存OSS或Hologres資料
命令格式
{LOAD OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <external_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES (<Options>)];參數說明
外部儲存:OSS
外部儲存:Hologres
使用樣本
外部儲存:OSS
通過內建Extractor(StorageHandler)匯入資料。假設MaxCompute和OSS的Owner是同一個帳號,通過阿里雲內網將vehicle.csv檔案的資料匯入MaxCompute。
將vehicle.csv檔案儲存至OSS Bucket目錄下
mc-test/data_location/,地區為oss-cn-hangzhou,並組織OSS目錄路徑。建立OSS Bucket詳情請參見建立儲存空間。根據Bucket、地區、Endpoint資訊組織OSS目錄路徑如下:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/登入MaxCompute本地用戶端(odpscmd)建立目標表
ambulance_data_csv_load。命令樣本如下:CREATE TABLE ambulance_data_csv_load ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING );執行
load overwrite命令,將OSS上的vehicle.csv檔案匯入目標表。命令樣本如下:LOAD OVERWRITE TABLE ambulance_data_csv_load FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/' STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH SERDEPROPERTIES ( -- AliyunODPSDefaultRole的ARN資訊,可通過RAM角色管理頁面擷取。 'odps.properties.rolearn'='acs:ram::xxxxx:role/aliyunodpsdefaultrole', 'odps.text.option.delimiter'=',' );查看角色的ARN資訊請參見查看RAM角色。
查看目標表
ambulance_data_csv_load的匯入結果。命令樣本如下:-- 開啟全表掃描,僅此Session有效。set odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_csv_load; -- 返回結果如下: +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | SW | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | +------------+------------+------------+------------+------------------+-------------------+------------+------------+
外部儲存:Hologres
樣本情境
已建立Hologres執行個體及資料庫,並建立一張表;
已在MaxCompute建立了一個Hologres外部表格,通過外部表格查詢已建立的Hologres表資料如下。
-- 查詢hologres外表: SELECT * FROM holo_ext; -- 返回結果: +------------+------+ | id | name | +------------+------+ | 1 | abc | | 2 | ereg | +------------+------+通過LOAD命令將此Hologres表資料匯入MaxCompute內表操作樣本。
建立一個MaxCompute內表。
-- 建立內部表 CREATE TABLE from_holo(id BIGINT, name STRING);通過LOAD命令匯入資料至MaxCompute。
-- load hologres表資料到MaxCompute內部表 LOAD INTO TABLE from_holo FROM LOCATION 'jdbc:postgresql://hgprecn-cn-wwo3ft0l****-cn-beijing-internal.hologres.aliyuncs.com:80/<YOUR_HOLO_DB_NAME>?application_name=MaxCompute¤tSchema=public&useSSL=false&table=<YOUR_HOLOGRES_TABLE_NAME>/' STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::18927322887*****:role/hologressrole', 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo' );
查詢匯入結果。
SELECT * FROM from_holo; -- 返回結果: +------------+------+ | id | name | +------------+------+ | 2 | ereg | | 1 | abc | +------------+------+
匯入其他開源格式資料
匯入其他開源格式資料
匯入的單個檔案大小不能超過3 GB,如果檔案過大,建議拆分後匯入。
命令格式
{LOAD OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <external_location>
[ROW FORMAT SERDE '<serde_class>'
[WITH SERDEPROPERTIES (<Options>)]
]
STORED AS <file_format>;參數說明
使用樣本
MaxCompute和OSS的Owner是同一個帳號
MaxCompute和OSS的Owner是同一個帳號
通過阿里雲內網將vehicle.textfile檔案的資料匯入MaxCompute。
若MaxCompute和OSS的Owner不是同一個帳號,授權方式可參見STS模式授權。
將vehicle.textfile檔案儲存至OSS Bucket目錄下
mc-test/data_location/,地區為oss-cn-hangzhou,並組織OSS目錄路徑。建立OSS Bucket詳情請參見建立儲存空間。根據Bucket、地區、Endpoint資訊組織OSS目錄路徑如下:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/登入本地MaxCompute用戶端(odpscmd)建立目標表
ambulance_data_textfile_load_pt。命令樣本如下:CREATE TABLE ambulance_data_textfile_load_pt ( vehicleId STRING, recordId STRING, patientId STRING, calls STRING, locationLatitute STRING, locationLongtitue STRING, recordTime STRING, direction STRING ) PARTITIONED BY ( ds STRING );執行
load overwrite命令,將OSS上的vehicle.textfile檔案匯入目標表。命令樣本如下:LOAD OVERWRITE TABLE ambulance_data_textfile_load_pt PARTITION(ds='20200910') FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS TEXTFILE;查看目標表ambulance_data_textfile_load_pt的匯入結果。命令樣本如下:
-- 開啟全表掃描,僅此Session有效。 SET odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_textfile_load_pt; -- 返回結果: +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | ds | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | 1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
資料按動態分區方式匯入目標表
資料按動態分區方式匯入目標表
如果OSS目錄下的子目錄是以分區名方式組織的,則可以將資料按動態分區的方式匯入到分區表。
將vehicle1.csv檔案和vehicle2.csv檔案分別儲存至OSS Bucket目錄
mc-test/data_location/ds=20200909/和mc-test/data_location/ds=20200910/,地區為oss-cn-hangzhou,並組織OSS目錄路徑。建立OSS Bucket詳情請參見建立儲存空間。根據Bucket、地區、Endpoint資訊組織OSS目錄路徑如下:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200909/' oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200910/'登入MaxCompute本地用戶端(odpscmd)建立目標表
ambulance_data_csv_load_dynpt。命令樣本如下:CREATE TABLE ambulance_data_csv_load_dynpt ( vehicleId STRING, recordId STRING, patientId STRING, calls STRING, locationLatitute STRING, locationLongtitue STRING, recordTime STRING, direction STRING ) PARTITIONED BY ( ds STRING );執行
load overwrite命令,將OSS上的檔案匯入目標表。命令樣本如下:LOAD OVERWRITE TABLE ambulance_data_csv_load_dynpt PARTITION(ds) FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE;查看目標表
ambulance_data_csv_load_dynpt的匯入結果。命令樣本如下:-- 開啟全表掃描,僅此Session有效。 SET odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_csv_load_dynpt; -- 返回結果: +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | ds | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | SW | 20200909 | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200909 | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 | | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | 20200910 | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
計費說明
LOAD命令將外部資料載入入數倉,隨用隨付方式免去對輸入的外部資料量的計量,訂用帳戶方式也沒有任務計量,但會佔用LOAD任務所啟動並執行資源群組的計算資源。
相關文檔
若希望將MaxCompute專案中的資料匯出到外部儲存(OSS、Hologres),以供其他計算引擎使用,請參見UNLOAD。
匯入至MaxCompute
Amazon Redshift 和 BigQuery 的資料需要先匯入 OSS,再通過 OSS 匯入 MaxCompute。
匯入OSS資料
匯入到目標MaxCompute分區表時,目標表的Schema(除分區列)需要和外部資料格式一致,且外部資料的Schema不包含分區列。
匯入文字格式設定(CSV / TSV)
使用內建的 StorageHandler 匯入 CSV 或 TSV 格式資料。
命令格式
LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];參數說明
配置項 | 說明 |
| 目標 OSS 路徑。格式為 |
| 指定內建處理器:
|
| 屬性配置:
|
使用樣本
樣本1:匯入 CSV 檔案資料至 MaxCompute
將儲存在 OSS 中的 vehicle.csv 檔案資料匯入至 MaxCompute。vehicle.csv 檔案的資料如下:
1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N操作步驟:
OSS 側:登入OSS管理主控台,將
vehicle.csv檔案儲存至 OSS Bucket 目錄mc-load/data_location/,地區為oss-cn-hangzhou,並構造 OSS 路徑。更多建立 OSS Bucket 資訊,請參見控制台建立儲存空間。根據 Bucket、地區、Endpoint 資訊構造 OSS 路徑如下:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/MaxCompute 側:登入 MaxCompute用戶端,建立目標表並執行 LOAD 命令。
-- 建立目標表 CREATE TABLE IF NOT EXISTS ambulance_data_csv_load ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING ); -- 執行 LOAD 命令 LOAD OVERWRITE TABLE ambulance_data_csv_load FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/' STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::123456789:role/AliyunODPSDefaultRole', 'odps.text.option.delimiter'=',' );驗證結果:
-- 開啟全表掃描(僅當前 Session 有效) SET odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_csv_load;返回結果:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | +------------+------------+------------+------------+------------------+-------------------+------------+------------+
樣本2:將資料按照動態分區方式匯入目標表
將儲存在 OSS 中的 vehicle1.csv 和 vehicle2.csv 檔案資料匯入至 MaxCompute 對應的分區中。vehicle1.csv 和 vehicle2.csv 檔案資料如下:
--vehicle1.csv
1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
--vehicle2.csv
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N操作步驟:
OSS 側:登入OSS管理主控台,將
vehicle1.csv檔案和vehicle2.csv檔案分別儲存至OSS Bucket目錄mc-load/data_location/ds=20200909/和mc-load/data_location/ds=20200910/,地區為oss-cn-hangzhou,並構造 OSS 路徑。更多建立 OSS Bucket 資訊,請參見控制台建立儲存空間。根據 Bucket、地區、Endpoint 資訊構造 OSS 路徑如下:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/ds=20200909/' oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/ds=20200910/'MaxCompute 側:登入 MaxCompute用戶端,建立目標表並執行 LOAD 命令。
-- 建立目標表 create table ambulance_data_csv_load_dynpt ( vehicleId STRING, recordId STRING, patientId STRING, calls STRING, locationLatitute STRING, locationLongtitue STRING, recordTime STRING, direction STRING) partitioned by (ds STRING); -- 執行 LOAD 命令 load overwrite table ambulance_data_csv_load_dynpt partition(ds) from location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/' row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' stored as textfile;驗證結果:查看目標表的匯入結果。
set odps.sql.allow.fullscan=true; select * from ambulance_data_csv_load_dynpt;返回結果:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | ds | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | SW | 20200909 | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200909 | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 | | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | 20200910 | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
匯入開源格式(Parquet / ORC 等)
支援匯入 Parquet、ORC、JSON、AVRO、RCFILE、SEQUENCEFILE、TEXTFILE 等格式。
文法
LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
ROW FORMAT SERDE '<serde_class>'
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)]
STORED AS <file_format>;核心配置
配置項 | 說明 |
| 目標 OSS 路徑。格式為 |
| 檔案格式。支援 |
| 序列化類別。如 Parquet 為 |
使用樣本
將儲存在 OSS 中的 vehicle.textfile 檔案資料匯入至 MaxCompute。vehicle.textfile 檔案的資料如下:
1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N操作步驟:
OSS 側:登入OSS管理主控台,將
vehicle.csv檔案儲存至 OSS Bucket 目錄mc-load/data_location/,地區為oss-cn-hangzhou,並構造 OSS 路徑。更多建立 OSS Bucket 資訊,請參見控制台建立儲存空間。根據 Bucket、地區、Endpoint 資訊構造 OSS 路徑如下:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/MaxCompute 側:登入 MaxCompute用戶端,建立目標表並執行 LOAD 命令。
-- 建立目標表 CREATE TABLE IF NOT EXISTS ambulance_data_textfile_load_pt ( vehicleId STRING, recordId STRING, patientId STRING, calls STRING, locationLatitute STRING, locationLongtitue STRING, recordTime STRING, direction STRING ) PARTITIONED BY (ds STRING); -- 執行 LOAD 命令 LOAD OVERWRITE TABLE ambulance_data_textfile_load_pt PARTITION(ds='20200910') FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS TEXTFILE;驗證結果:查看目標表的匯入結果。
SET odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_textfile_load_pt;返回結果:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | ds | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | 1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
匯入Hologres資料
支援直接從 Hologres 外部表格匯入資料到 MaxCompute 內部表。
不支援雙簽名:不支援使用雙簽名授權模式。
不支援分區表:不支援將匯入 Hologres 分區表資料。
文法
LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <hologres_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];核心配置
配置項 | 說明 |
| Hologres JDBC 串連串。格式: |
| 指定內建處理器。固定取值為 |
| 必須包含:
|
使用樣本
將 Hologres 中 holo_2_mc 表資料匯入至 MaxCompute 中,資料如下:
id | name
---|-----
1 | abc
2 | ereg操作步驟:
Hologres 側:在 Hologres 中建立源表
holo_2_mc,並匯入資料。MaxCompute 側:登入 MaxCompute用戶端,建立目標表並執行 LOAD 命令。
-- 建立目標表 CREATE TABLE IF NOT EXISTS mf_from_holo ( id BIGINT, name STRING ); -- 執行 LOAD 命令 LOAD INTO TABLE mf_from_holo FROM LOCATION 'jdbc:postgresql://hgprecn-cn-xxx-cn-beijing-internal.hologres.aliyuncs.com:80/mf_db?application_name=MaxCompute¤tSchema=public&useSSL=false&table=mf_holo/' STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::189273xxx:role/hologressrole', 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo' );驗證結果:在 MaxCompute 中查詢匯入的表資料。
SELECT * FROM mf_from_holo;返回結果:
+------------+------+ | id | name | +------------+------+ | 1 | abc | | 2 | ereg | +------------+------+