LOAD 命令用于将外部存储(OSS、Hologres、Amazon Redshift、BigQuery)中的数据导入到 MaxCompute 表或分区中。
功能介绍
MaxCompute支持使用load overwrite或load into命令将实时数仓Hologres、对象存储OSS、Amazon Redshift、BigQuery外部存储的CSV格式或其他开源格式数据导入MaxCompute的表或表的分区。其中:
Amazon Redshift和BigQuery的数据需要先导入OSS,才可以通过OSS导入MaxCompute。
MaxCompute支持将数据按照动态分区方式导入MaxCompute的分区表的分区。
load into命令会直接向表或分区中追加数据。load overwrite命令会先清空表或分区中的原有数据,再向表或分区中插入数据。
使用限制
权限要求
MaxCompute 权限
CreateTable 和 Alter 权限:用于在 MaxCompute 项目空间中创建表或修改表数据。授权操作请参见MaxCompute权限。
外部存储权限
OSS 权限:需要授权 MaxCompute 访问 OSS 的权限(读取/列举 Object)。推荐使用 STS 模式授权,具备更高安全性。详情请参见STS模式授权。
Hologres 权限:需要创建一个 RAM 角色,为其授权允许MaxCompute访问的权限,并将角色添加至Hologres实例,完成授权,操作详情请参见创建Hologres外部表(STS模式)。
其他限制
LOAD命令不支持使用黑白名单参数。
当前只支持将外部存储的数据导入至同区域的MaxCompute项目空间中。
通过OSS导入数据时:
导入到目标MaxCompute分区表时,目标表的Schema(除分区列)需要和外部数据格式一致,且外部数据的Schema不包含分区列。
通过Hologres导入数据时:
不支持将Hologres分区表数据导入MaxCompute。
不支持使用双签名授权模式的Hologres外表导入数据至MaxCompute。
导入数据
导入外部存储OSS或Hologres数据
导入外部存储OSS或Hologres数据
命令格式
{LOAD OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <external_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES (<Options>)];参数说明
外部存储:OSS
外部存储:Hologres
使用示例
外部存储:OSS
通过内置Extractor(StorageHandler)导入数据。假设MaxCompute和OSS的Owner是同一个账号,通过阿里云内网将vehicle.csv文件的数据导入MaxCompute。
将vehicle.csv文件保存至OSS Bucket目录下
mc-test/data_location/,地域为oss-cn-hangzhou,并组织OSS目录路径。创建OSS Bucket详情请参见创建存储空间。根据Bucket、地域、Endpoint信息组织OSS目录路径如下:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/登录MaxCompute本地客户端(odpscmd)创建目标表
ambulance_data_csv_load。命令示例如下:CREATE TABLE ambulance_data_csv_load ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING );执行
load overwrite命令,将OSS上的vehicle.csv文件导入目标表。命令示例如下:LOAD OVERWRITE TABLE ambulance_data_csv_load FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/' STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH SERDEPROPERTIES ( -- AliyunODPSDefaultRole的ARN信息,可通过RAM角色管理页面获取。 'odps.properties.rolearn'='acs:ram::xxxxx:role/aliyunodpsdefaultrole', 'odps.text.option.delimiter'=',' );查看角色的ARN信息请参见查看RAM角色。
查看目标表
ambulance_data_csv_load的导入结果。命令示例如下:-- 开启全表扫描,仅此Session有效。set odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_csv_load; -- 返回结果如下: +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | SW | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | +------------+------------+------------+------------+------------------+-------------------+------------+------------+
外部存储:Hologres
示例场景
已创建Hologres实例及数据库,并创建一张表;
已在MaxCompute创建了一个Hologres外部表,通过外部表查询已创建的Hologres表数据如下。
-- 查询hologres外表: SELECT * FROM holo_ext; -- 返回结果: +------------+------+ | id | name | +------------+------+ | 1 | abc | | 2 | ereg | +------------+------+通过LOAD命令将此Hologres表数据导入MaxCompute内表操作示例。
创建一个MaxCompute内表。
-- 创建内部表 CREATE TABLE from_holo(id BIGINT, name STRING);通过LOAD命令导入数据至MaxCompute。
-- load hologres表数据到MaxCompute内部表 LOAD INTO TABLE from_holo FROM LOCATION 'jdbc:postgresql://hgprecn-cn-wwo3ft0l****-cn-beijing-internal.hologres.aliyuncs.com:80/<YOUR_HOLO_DB_NAME>?application_name=MaxCompute¤tSchema=public&useSSL=false&table=<YOUR_HOLOGRES_TABLE_NAME>/' STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::18927322887*****:role/hologressrole', 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo' );
查询导入结果。
SELECT * FROM from_holo; -- 返回结果: +------------+------+ | id | name | +------------+------+ | 2 | ereg | | 1 | abc | +------------+------+
导入其他开源格式数据
导入其他开源格式数据
导入的单个文件大小不能超过3 GB,如果文件过大,建议拆分后导入。
命令格式
{LOAD OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <external_location>
[ROW FORMAT SERDE '<serde_class>'
[WITH SERDEPROPERTIES (<Options>)]
]
STORED AS <file_format>;参数说明
使用示例
MaxCompute和OSS的Owner是同一个账号
MaxCompute和OSS的Owner是同一个账号
通过阿里云内网将vehicle.textfile文件的数据导入MaxCompute。
若MaxCompute和OSS的Owner不是同一个账号,授权方式可参见STS模式授权。
将vehicle.textfile文件保存至OSS Bucket目录下
mc-test/data_location/,地域为oss-cn-hangzhou,并组织OSS目录路径。创建OSS Bucket详情请参见创建存储空间。根据Bucket、地域、Endpoint信息组织OSS目录路径如下:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/登录本地MaxCompute客户端(odpscmd)创建目标表
ambulance_data_textfile_load_pt。命令示例如下:CREATE TABLE ambulance_data_textfile_load_pt ( vehicleId STRING, recordId STRING, patientId STRING, calls STRING, locationLatitute STRING, locationLongtitue STRING, recordTime STRING, direction STRING ) PARTITIONED BY ( ds STRING );执行
load overwrite命令,将OSS上的vehicle.textfile文件导入目标表。命令示例如下:LOAD OVERWRITE TABLE ambulance_data_textfile_load_pt PARTITION(ds='20200910') FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS TEXTFILE;查看目标表ambulance_data_textfile_load_pt的导入结果。命令示例如下:
-- 开启全表扫描,仅此Session有效。 SET odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_textfile_load_pt; -- 返回结果: +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | ds | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | 1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
数据按动态分区方式导入目标表
数据按动态分区方式导入目标表
如果OSS目录下的子目录是以分区名方式组织的,则可以将数据按动态分区的方式导入到分区表。
将vehicle1.csv文件和vehicle2.csv文件分别保存至OSS Bucket目录
mc-test/data_location/ds=20200909/和mc-test/data_location/ds=20200910/,地域为oss-cn-hangzhou,并组织OSS目录路径。创建OSS Bucket详情请参见创建存储空间。根据Bucket、地域、Endpoint信息组织OSS目录路径如下:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200909/' oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/ds=20200910/'登录MaxCompute本地客户端(odpscmd)创建目标表
ambulance_data_csv_load_dynpt。命令示例如下:CREATE TABLE ambulance_data_csv_load_dynpt ( vehicleId STRING, recordId STRING, patientId STRING, calls STRING, locationLatitute STRING, locationLongtitue STRING, recordTime STRING, direction STRING ) PARTITIONED BY ( ds STRING );执行
load overwrite命令,将OSS上的文件导入目标表。命令示例如下:LOAD OVERWRITE TABLE ambulance_data_csv_load_dynpt PARTITION(ds) FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE;查看目标表
ambulance_data_csv_load_dynpt的导入结果。命令示例如下:-- 开启全表扫描,仅此Session有效。 SET odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_csv_load_dynpt; -- 返回结果: +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | ds | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | SW | 20200909 | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200909 | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 | | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | 20200910 | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
计费说明
LOAD命令将外部数据加载入数仓,按量付费方式免去对输入的外部数据量的计量,包年包月方式也没有任务计量,但会占用LOAD任务所运行的资源组的计算资源。
相关文档
若希望将MaxCompute项目中的数据导出到外部存储(OSS、Hologres),以供其他计算引擎使用,请参见UNLOAD。
导入至MaxCompute
Amazon Redshift 和 BigQuery 的数据需要先导入 OSS,再通过 OSS 导入 MaxCompute。
导入OSS数据
导入到目标MaxCompute分区表时,目标表的Schema(除分区列)需要和外部数据格式一致,且外部数据的Schema不包含分区列。
导入文本格式(CSV / TSV)
使用内置的 StorageHandler 导入 CSV 或 TSV 格式数据。
命令格式
LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];参数说明
配置项 | 说明 |
| 目标 OSS 路径。格式为 |
| 指定内置处理器:
|
| 属性配置:
|
使用示例
示例1:导入 CSV 文件数据至 MaxCompute
将存储在 OSS 中的 vehicle.csv 文件数据导入至 MaxCompute。vehicle.csv 文件的数据如下:
1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N操作步骤:
OSS 侧:登录OSS管理控制台,将
vehicle.csv文件保存至 OSS Bucket 目录mc-load/data_location/,地域为oss-cn-hangzhou,并构造 OSS 路径。更多创建 OSS Bucket 信息,请参见控制台创建存储空间。根据 Bucket、区域、Endpoint 信息构造 OSS 路径如下:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/MaxCompute 侧:登录 MaxCompute客户端,创建目标表并执行 LOAD 命令。
-- 创建目标表 CREATE TABLE IF NOT EXISTS ambulance_data_csv_load ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING ); -- 执行 LOAD 命令 LOAD OVERWRITE TABLE ambulance_data_csv_load FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/' STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::123456789:role/AliyunODPSDefaultRole', 'odps.text.option.delimiter'=',' );验证结果:
-- 开启全表扫描(仅当前 Session 有效) SET odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_csv_load;返回结果:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | +------------+------------+------------+------------+------------------+-------------------+------------+------------+
示例2:将数据按照动态分区方式导入目标表
将存储在 OSS 中的 vehicle1.csv 和 vehicle2.csv 文件数据导入至 MaxCompute 对应的分区中。vehicle1.csv 和 vehicle2.csv 文件数据如下:
--vehicle1.csv
1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
--vehicle2.csv
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N操作步骤:
OSS 侧:登录OSS管理控制台,将
vehicle1.csv文件和vehicle2.csv文件分别保存至OSS Bucket目录mc-load/data_location/ds=20200909/和mc-load/data_location/ds=20200910/,地域为oss-cn-hangzhou,并构造 OSS 路径。更多创建 OSS Bucket 信息,请参见控制台创建存储空间。根据 Bucket、区域、Endpoint 信息构造 OSS 路径如下:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/ds=20200909/' oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/ds=20200910/'MaxCompute 侧:登录 MaxCompute客户端,创建目标表并执行 LOAD 命令。
-- 创建目标表 create table ambulance_data_csv_load_dynpt ( vehicleId STRING, recordId STRING, patientId STRING, calls STRING, locationLatitute STRING, locationLongtitue STRING, recordTime STRING, direction STRING) partitioned by (ds STRING); -- 执行 LOAD 命令 load overwrite table ambulance_data_csv_load_dynpt partition(ds) from location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/' row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' stored as textfile;验证结果:查看目标表的导入结果。
set odps.sql.allow.fullscan=true; select * from ambulance_data_csv_load_dynpt;返回结果:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | ds | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | SW | 20200909 | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200909 | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | N | 20200909 | | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | 20200910 | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | 20200910 | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20200910 | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
导入开源格式(Parquet / ORC 等)
支持导入 Parquet、ORC、JSON、AVRO、RCFILE、SEQUENCEFILE、TEXTFILE 等格式。
语法
LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <oss_location>
ROW FORMAT SERDE '<serde_class>'
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)]
STORED AS <file_format>;核心配置
配置项 | 说明 |
| 目标 OSS 路径。格式为 |
| 文件格式。支持 |
| 序列化类。如 Parquet 为 |
使用示例
将存储在 OSS 中的 vehicle.textfile 文件数据导入至 MaxCompute。vehicle.textfile 文件的数据如下:
1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S
1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W
1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S
1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S
1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N
1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW
1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE
1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N操作步骤:
OSS 侧:登录OSS管理控制台,将
vehicle.csv文件保存至 OSS Bucket 目录mc-load/data_location/,地域为oss-cn-hangzhou,并构造 OSS 路径。更多创建 OSS Bucket 信息,请参见控制台创建存储空间。根据 Bucket、区域、Endpoint 信息构造 OSS 路径如下:oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-load/data_location/MaxCompute 侧:登录 MaxCompute客户端,创建目标表并执行 LOAD 命令。
-- 创建目标表 CREATE TABLE IF NOT EXISTS ambulance_data_textfile_load_pt ( vehicleId STRING, recordId STRING, patientId STRING, calls STRING, locationLatitute STRING, locationLongtitue STRING, recordTime STRING, direction STRING ) PARTITIONED BY (ds STRING); -- 执行 LOAD 命令 LOAD OVERWRITE TABLE ambulance_data_textfile_load_pt PARTITION(ds='20200910') FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS TEXTFILE;验证结果:查看目标表的导入结果。
SET odps.sql.allow.fullscan=true; SELECT * FROM ambulance_data_textfile_load_pt;返回结果:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | ds | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+ | 1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,6,9,1,46.81006,-92.08174,9/14/2014 0:00,S | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,7,53,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,8,63,1,46.81006,-92.08174,9/14/2014 0:00,SW | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,9,4,1,46.81006,-92.08174,9/14/2014 0:00,NE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | | 1,10,31,1,46.81006,-92.08174,9/14/2014 0:00,N | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 20200910 | +------------+------------+------------+------------+------------------+-------------------+------------+------------+------------+
导入Hologres数据
支持直接从 Hologres 外部表导入数据到 MaxCompute 内部表。
不支持双签名:不支持使用双签名授权模式。
不支持分区表:不支持将导入 Hologres 分区表数据。
语法
LOAD {OVERWRITE|INTO} TABLE <table_name> [PARTITION (<pt_spec>)]
FROM LOCATION <hologres_location>
STORED BY <StorageHandler>
[WITH SERDEPROPERTIES ('<property_name>'='<property_value>',...)];核心配置
配置项 | 说明 |
| Hologres JDBC 连接串。格式: |
| 指定内置处理器。固定取值为 |
| 必须包含:
|
使用示例
将 Hologres 中 holo_2_mc 表数据导入至 MaxCompute 中,数据如下:
id | name
---|-----
1 | abc
2 | ereg操作步骤:
Hologres 侧:在 Hologres 中创建源表
holo_2_mc,并导入数据。MaxCompute 侧:登录 MaxCompute客户端,创建目标表并执行 LOAD 命令。
-- 创建目标表 CREATE TABLE IF NOT EXISTS mf_from_holo ( id BIGINT, name STRING ); -- 执行 LOAD 命令 LOAD INTO TABLE mf_from_holo FROM LOCATION 'jdbc:postgresql://hgprecn-cn-xxx-cn-beijing-internal.hologres.aliyuncs.com:80/mf_db?application_name=MaxCompute¤tSchema=public&useSSL=false&table=mf_holo/' STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::189273xxx:role/hologressrole', 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo' );验证结果:在 MaxCompute 中查询导入的表数据。
SELECT * FROM mf_from_holo;返回结果:
+------------+------+ | id | name | +------------+------+ | 1 | abc | | 2 | ereg | +------------+------+