If you want to access data stored in Hologres, MaxCompute allows you to use external tables to access the data in Hologres without the need to import the data to MaxCompute. This helps you query data in Hologres in an efficient manner. This topic describes how to execute the CREATE EXTERNAL TABLE statement to create a Hologres external table. In the statement, you must specify a Hologres data source, Security Token Service (STS) authentication information or the dual-signature mode, a Hologres source table, and a JDBC driver.
Background information
Hologres is a real-time interactive analytics data warehouse. Hologres is compatible with PostgreSQL and seamlessly integrates with MaxCompute.
You can create a Hologres external table in the MaxCompute console to query the data of a Hologres data source based on STS authentication information and a PostgreSQL JDBC driver. This method prevents redundant data storage and allows you to obtain query results at a fast speed without the need to import or export data.
Prerequisites
Before you create a Hologres external table, make sure that the following conditions are met:
A Hologres database and a Hologres table are created.
For more information about Hologres databases, see Create a database.
For more information about how to create a Hologres table, see CREATE TABLE.
Information of the sample Hologres instance in this topic:
Name of the Hologres database:
mc_test
.Schema of the Hologres database:
public
.Endpoint of the Hologres instance in the classic network:
hgprecn-cn-oew210ut****-cn-hangzhou-internal.hologres.aliyuncs.com:80
.Name of the Hologres table:
holo
. The following figure shows the data in the table.
The MaxCompute project for which you want to create a Hologres external table is created.
For more information about how to create a MaxCompute project, see Create a MaxCompute project.
The MaxCompute client is installed.
For more information about how to install the MaxCompute client, see Install and configure the MaxCompute client.
Limits
When you use Hologres external tables, take note of the following limits:
MaxCompute does not support the update or delete operation on Hologres external tables.
Partitioned tables of Hologres are not mapped to partitioned tables of MaxCompute. Hologres external tables cannot be partitioned.
If you use multiple processes to concurrently write a large amount of data to a Hologres external table, a process rewrites data to the Hologres external table in rare cases. As a result, duplicate data exists.
In a Hologres external table that you create in MaxCompute, a value of the DECIMAL type must be
a 38-digit number, in which the decimal part consists of 18 digits by default
. If the number of decimal places of a value in a column is less than 18, you can set the data type of the column to STRING when you create a Hologres external table in MaxCompute. When you use the data in the external table, you can use theCAST
function to forcefully convert the data of the STRING type into the DECIMAL type.Complex data types, such as ARRAY, MAP, and STRUCT, are not supported in Hologres external tables that you create in MaxCompute.
The JSON, JSONB, and MONEY data types that are supported in Hologres are not supported in MaxCompute. When you create a Hologres external table in MaxCompute, you cannot configure columns that match data of these types in the Hologres source table.
Hologres external tables do not support the clustering attribute.
Precautions
When you use Hologres external tables, take note of the following items:
If the IP whitelist feature is enabled in Hologres, users must use the dual-signature mode to create Hologres external tables. If you create Hologres external tables by using the STS mode, MaxCompute is intercepted by the IP whitelist when accessing Hologres.
The names of the parent and child tables in a Hologres database are specified in Hologres external tables. SQL statements can be executed on the Hologres external tables. Parent and child tables can be mapped to Hologres external tables. However, you can only read data from parent tables.
You cannot use the INSERT ON CONFLICT statement to write data to Hologres external tables. For more information, see INSERT ON CONFLICT(UPSERT). If the Hologres source table contains a primary key, you must ensure that the primary key of the data that you want to write to the Hologres external table is not the same as the primary key of the Hologres source table.
When you create an external table, the table name and field names are not case-sensitive. When you query external tables or fields, the table names and field names are not case-sensitive, and forcible uppercase and lowercase conversions are not supported.
Syntax
When you create a Hologres external table by using the CREATE EXTERNAL TABLE statement, you must specify a storage handler, STS authentication information or the dual-signature mode, and the JDBC URL of a Hologres data source in the statement. Syntax for creating a Hologres external table:
Create a Hologres external table in STS mode.
create external table [if not exists] <table_name>( <col1_name> <data_type>, <col2_name> <data_type>, ...... ) stored by '<com.aliyun.odps.jdbc.JdbcStorageHandler>' with serdeproperties ( 'odps.properties.rolearn'='<ram_arn>') location '<jdbc:postgresql://<endpoint>:<port>/<database>?ApplicationName=MaxCompute&[currentSchema=<schema>&][useSSL={true|false}&]table=<holo_table_name>/>' tblproperties ( 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo', ['odps.federation.jdbc.colmapping'='<col1:column1,col2:column2,col3:column3,...>'] );
Create a Hologres external table in dual-signature mode.
The dual-signature mode allows you to use the same RAM user to access the Hologres external table that you create in MaxCompute and the Hologres source table only if the RAM user has permissions on the tables. This way, manual authorization is not required. You can also configure an IP address whitelist to allow access to Hologres from IP addresses in the whitelist.
-- Enable the dual-signature mode. set odps.sql.common.table.planner.ext.hive.bridge=true; -- Create an external table. create external table [if not exists] <table_name>( <col1_name> <data_type>, <col2_name> <data_type>, ...... ) stored by '<com.aliyun.odps.jdbc.JdbcStorageHandler>' location'<jdbc:postgresql://<endpoint>:<port>/<database>?ApplicationName=MaxCompute&[currentSchema=<schema>&][useSSL={true|false}&]table=<holo_table_name>/>' tblproperties ( 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo', ['odps.federation.jdbc.colmapping'='<col1:column1,col2:column2,col3:column3,...>'] );
odps.sql.common.table.planner.ext.hive.bridge: Configure this parameter only when you create a Hologres external table in dual-signature mode. This parameter specifies whether to enable the dual-signature mode. If you set this parameter to
true
, a Hologres external table is created in dual signature mode.if not exists: optional. If you create a Hologres external table by using the name of an existing table but do not specify the if not exists parameter, an error is returned. If you configure the if not exists parameter, a success message is returned no matter whether a table with the same name already exists. The success message is returned even if the schema of the existing table with the same name is different from that of the table you want to create. The metadata of the existing table remains unchanged.
table_name: required. The name of the Hologres external table that you want to create in MaxCompute.
col_name: required. The names of columns in the Hologres external table that you want to create in MaxCompute.
data_type: required. The data types of columns in the Hologres external table.
stored by: required. A storage handler, which defines the method that you want to use to access the Hologres external table. Set the value to
Hologres
. The value indicates that the Hologres external table is accessed by using JdbcStorageHandler.ram_arn: Configure this parameter only when you create a Hologres external table in STS mode. This parameter specifies the Alibaba Cloud Resource Name (ARN) of the specified RAM role. The ARN is used as the STS authentication information. To obtain the ARN of the specified RAM role, you can perform the following steps: Log on to the Resource Access Management (RAM) console. On the Roles page, find the RAM role whose ARN you want to query and click the RAM role in the Role Name column. On the page that appears, view the ARN in the Basic Information section.
location: required. The JDBC URL of the Hologres instance. Description of the fields in this parameter:
endpoint: required. The endpoint of the Hologres instance in the classic network. For more information about how to obtain endpoints, see Instance configurations.
port: required. The port number of the Hologres instance. For more information about how to obtain port numbers, see Instance configurations.
database: required. The name of the Hologres database that you want to access. For more information about Hologres databases, see CREATE DATABASE.
ApplicationName: required. The default value is MaxCompute and no modification is required.
schema: optional. If the name of the source table is unique in the Hologres database or the source table is a table in the default schema, you do not need to specify this parameter. For more information about schemas, see CREATE SCHEMA.
holo_table_name: required. The name of the Hologres source table. For more information about Hologres source tables, see CREATE TABLE.
tblproperties:
mcfed.mapreduce.jdbc.driver.class: required. The JDBC driver that is used to access the Hologres database. Set the value to
org.postgresql.Driver
.odps.federation.jdbc.target.db.type: required. The type of the Hologres database that you want to access. Set the value to
holo
.odps.federation.jdbc.colmapping: optional. If you want to map some columns of the Hologres source table to the Hologres external table, you must configure this parameter. This parameter specifies the mappings between the fields of the Hologres source table and the fields of the Hologres external table. If you do not configure this parameter, the fields in the source table are mapped to the fields of the Hologres external table based on the sequence of the fields in the source table. If the name of a field in Hologres contains only uppercase letters, enclose the field name in a pair of double quotation marks ("). The value of this parameter is in the
MaxCompute field 1 : "Hologres field 1" [,MaxCompute field 2 : "Hologres field 2" ,...]
format.
Create a Hologres external table in STS mode
The following section describes how to create a Hologres external table in STS mode.
Create a RAM role.
Create a RAM role and obtain the ARN of the RAM role. The ARN is used to specify the STS authentication information when you create an external table.
Log on to the RAM console and create a RAM role.
In the Create Role panel, select Alibaba Cloud Account or IdP for Select Trusted Entity.
Alibaba Cloud Account:
A RAM user of your Alibaba Cloud account can access cloud resources by assuming a RAM role. For more information, see Create a RAM role for a trusted Alibaba Cloud account.
IdP:
You can log on to the Alibaba Cloud Management Console by using role-based single sign-on (SSO). You can log on without the need to provide a username and password. For more information, see Create a RAM role for a trusted IdP.
Edit the trust policy.
On the Roles page, click the name of the RAM role that you create.
Click the Trust Policy tab.
On the Trust Policy tab, click Edit Trust Policy.
Modify the trust policy based on the following information.
The configuration of the trust policy varies based on the type of the trusted entity that you selected.
Alibaba Cloud Account is selected for Select Trusted Entity:
{ "Statement": [ { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "RAM": [ "acs:ram::<UID>:root" ] } }, { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": [ "<UID>@odps.aliyuncs.com" ] } } ], "Version": "1" }
IdP is selected for Select Trusted Entity:
{ "Statement": [ { "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "saml:recipient": "https://signin.aliyun.com/saml-role/sso" } }, "Effect": "Allow", "Principal": { "Federated": [ "acs:ram::<UID>:saml-provider/IDP" ] } }, { "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": [ "<UID>@odps.aliyuncs.com" ] } } ], "Version": "1" }
Note<UID> indicates the ID of your Alibaba Cloud account. You can obtain the ID of your Alibaba Cloud account on the Security Settings page of the Account Management console.
Click OK.
Add the RAM role to a Hologres instance and grant permissions to the RAM role.
Before the RAM role can use the Hologres instance, the role must obtain the required development permissions on the Hologres instance. By default, the RAM role is not granted the required permissions to view or manage instances in the Hologres console. You must grant the required permissions to the RAM role by using your Alibaba Cloud account. After you add the RAM role to a Hologres instance, you can use one of the following methods to grant the permissions to the RAM role:
Use the Hologres console to grant the required permissions to the RAM role.
Log on to the Hologres console.
In the left-side navigation pane, click Instances. On the Hologres Instances page, click the name of the Hologres instance to which you want to add the RAM role.
On the instance details page, click Accounts.
On the User management page, click Add New User to add a RAM role to the Hologres instance.
On the Database Authorization page, grant the development permissions on the instance to the RAM role.
Use an SQL statement to grant the required permissions to the RAM role.
For more information about how to use an SQL statement to grant the required permissions to a RAM user, see Overview.
By default, a RAM user is not granted the permissions to perform operations in the Hologres console. If you want a RAM user to assume the RAM role, you must attach the AliyunRAMReadOnlyAccess policy to the RAM user by using your Alibaba Cloud account. Otherwise, the RAM user cannot perform operations in the Hologres console. For more information, see Grant permissions on Hologres to RAM users.
Create a Hologres external table
Log on to the MaxCompute client and create a Hologres external table based on the prepared data. For more information about the syntax, see Syntax.
Install and start the MaxCompute client and go to the MaxCompute project for which you want to create a Hologres external table.
For more information about the statements that are used to go to a project, see Project operations.
Execute the following statement to create a Hologres external table:
Sample statement:
create external table if not exists my_table_holo_jdbc ( id bigint, name string ) stored by 'com.aliyun.odps.jdbc.JdbcStorageHandler' with serdeproperties ( 'odps.properties.rolearn'='acs:ram::139699392458****:role/aliyunodpsholorole') location 'jdbc:postgresql://hgprecn-cn-oew210ut****-cn-hangzhou-internal.hologres.aliyuncs.com:80/mc_test?ApplicationName=MaxCompute¤tSchema=public&useSSL=false&table=holo/' tblproperties ( 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo', 'odps.federation.jdbc.colmapping'='id:id,name:name' );
Execute the following statement to query the data of the Hologres source table by using the created Hologres external table:
Sample statement:
-- Add properties to access the Hologres external table. set odps.sql.split.hive.bridge=true; set odps.sql.hive.compatible=true; -- Query the data of the Hologres external table. select * from my_table_holo_jdbc limit 10;
The following result is returned:
+------------+------------+ | id | name | +------------+------------+ | 1 | kate | | 2 | mary | | 3 | bob | | 4 | tom | | 5 | lulu | | 6 | mark | | 7 | haward | | 8 | lilei | | 9 | hanmeimei | | 10 | lily | +------------+------------+
Optional:Exchange data between MaxCompute and Hologres based on the Hologres external table and perform joint data analysis.
For example, you can write data that is processed by MaxCompute to a Hologres database by using the Hologres external table. This accelerates data analysis and implements online services. Sample statement:
-- Add properties to access the Hologres external table. set odps.sql.split.hive.bridge=true; set odps.sql.hive.compatible=true; -- Insert data into the Hologres external table. insert into my_table_holo_jdbc values (12,'alice'); -- Query the data of the Hologres external table. select * from my_table_holo_jdbc;
The following result is returned:
+------------+------------+ | id | name | +------------+------------+ | 12 | alice | | 1 | kate | | 2 | mary | | 3 | bob | | 4 | tom | | 5 | lulu | | 6 | mark | | 7 | haward | | 8 | lilei | | 9 | hanmeimei | | 10 | lily | | 11 | lucy | +------------+------------+
Dimension tables that are frequently updated are stored in Hologres databases. This meets real-time data update requirements. MaxCompute uses external tables to access the dimension tables in Hologres databases. Then, association analysis is performed on the data in the dimension tables and the fact tables in MaxCompute. Sample statements:
-- Add properties to access the Hologres external table. set odps.sql.split.hive.bridge=true; set odps.sql.hive.compatible=true; -- Create a MaxCompute internal table. create table holo_test as select * from my_table_holo_jdbc; --- Perform association analysis on the data in the MaxCompute internal table and Hologres external table. select * from my_table_holo_jdbc t1 inner join holo_test t2 on t1.id=t2.id;
The following result is returned:
+------------+------------+------------+------------+ | id | name | id2 | name2 | +------------+------------+------------+------------+ | 1 | kate | 1 | kate | | 2 | mary | 2 | mary | | 3 | bob | 3 | bob | | 4 | tom | 4 | tom | | 5 | lulu | 5 | lulu | | 6 | mark | 6 | mark | | 7 | harward | 7 | harward | | 8 | lilei | 8 | lilei | | 9 | hanmeimei | 9 | hanmeimei | | 10 | lily | 10 | lily | | 11 | lucy | 11 | lucy | | 12 | alice | 12 | alice | +------------+------------+------------+------------+
Create a Hologres external table in dual-signature mode
The dual-signature mode is an authentication protocol that is developed based on MaxCompute and Hologres. After a signature is added to the account logon information of MaxCompute and authentication data is sent to Hologres, Hologres performs authentication by using the account name based on the protocol that is compatible with the underlying layer of MaxCompute. If the account name used for MaxCompute is the same as that used for Hologres, the authentication is successful. This way, you can directly access external tables by using the same account name without the need to configure additional authentication information.
Prerequisites
Hologres has an account with the same name as MaxCompute, and the account has the Read and Write permissions on the corresponding tables in Hologres.
Limits
Only Hologres V1.3 and later allow you to create Hologres external tables on MaxCompute by using the dual-signature mode. You can use the dual-signature mode to read data from a Hologres external table but cannot write data to a Hologres external table.
Sample statements
Log on to the MaxCompute client and create a Hologres external table by using the dual-signature mode. For information about how to create Hologres external tables, see Syntax.
-- Create a Hologres external table. CREATE EXTERNAL TABLE IF NOT EXISTS holo_mc_external_dbl ( id int, name string, ds string ) STORED BY 'com.aliyun.odps.jdbc.JdbcStorageHandler' location 'jdbc:postgresql://hgprecn-cn-zvp2o6aq****-cn-beijing-internal.hologres.aliyuncs.com:80/mc_test?ApplicationName=MaxCompute¤tSchema=public&preferQueryMode=simple&useSSL=false&table=mf_holo_mc_up/' TBLPROPERTIES ( 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo', 'odps.federation.jdbc.colmapping'='id:id,name:name,ds:ds' ); -- Query data from the Hologres external table. You must execute the following SET statement together with the SQL statement. set odps.sql.common.table.planner.ext.hive.bridge=true; select * from holo_mc_external_dbl;
Enable the direct read feature for Hologres external tables
Background information
MaxCompute reads data from Hologres external tables in Java Database Connectivity (JDBC) mode. MaxCompute of the latest version allows you to directly read data from Hologres storage layers and provides the following benefits:
Greatly reduces the latency of table reading and significantly improves the data query speed.
Greatly reduces the number of connections to FE in Hologres. For most queries, only one connection is required.
Limits
If you enable the direct read mode of Hologres, the following limits apply. If the conditions are not met, the JDBC mode is used.
The version of the Hologres instance must be V1.3.34 or later.
If the version of your Hologres instance is earlier than V1.3.34, the direct read feature is not supported.
Due to network connectivity, only MaxCompute projects in the same region can access Hologres instances.
Currently, cross-region access is not supported. The following error message is reported for cross-region access:
FAILED: ODPS-0010000:System internal error - fuxi job failed, caused by: Pangu request failed with error code 3
.The direct read feature does not support Hologres cold storage tables.
The direct read feature does not support Hologres row-oriented tables.
If you configure primary and secondary Hologres instances for the Hologres database, you can use only the URL of the primary Hologres instance to connect to MaxCompute. You cannot use the URL of the secondary instance to connect to MaxCompute.
The direct read feature verifies the data types of columns in MaxCompute and Hologres. If the data types do not match, MaxCompute automatically reads data in JDBC mode. When you create a Hologres external table in MaxCompute, you must comply with the following limits on data types:
The following data types are not supported in JDBC mode and MaxCompute direct read mode:
Complex data types, such as ARRAY, MAP, and STRUCT
Data types, such as JSON and MONEY
The following data types are supported in JDBC mode but are not supported in MaxCompute direct read mode:
BINARY type
The following data types are supported in MaxCompute direct read mode but are not supported in JDBC mode:
JSONB type
If the TIMESTAMP data type is mapped to the TIMESTAMP WITH TIME ZONE data type of Hologres in MaxCompute direct read mode, the following time errors may occur:
If the time in the TIMESTAMP WITH TIME ZONE column of a Hologres external table is earlier than
1900-12-31 15:54:15
, the time that is obtained in direct read mode is 344 seconds later than the time in the TIMESTAMP WITH TIME ZONE column of the Hologres external table.If the time in the TIMESTAMP WITH TIME ZONE column of a Hologres external table is between
1900-12-31 15:54:16
and1969-12-31 23:59:58
, the time that is obtained in direct read mode is one second later than the time in the TIMESTAMP WITH TIME ZONE column of the Hologres external table.If the time in the TIMESTAMP WITH TIME ZONE column of a Hologres external table is later than
1969-12-31 23:59:59
, the time that is obtained in direct read mode is the same as the time in the TIMESTAMP WITH TIME ZONE column of the Hologres external table.NoteIf the TIMESTAMP data type is mapped to the TIMESTAMP WITH TIME ZONE data type of Hologres in MaxCompute direct read mode, the following time zone and time errors may occur:
For example, the time zone of MaxCompute is UTC+8. If the time in the TIMESTAMP WITH TIME ZONE column of a Hologres external table is
2000-01-01 00:00:00
, the time that is obtained in direct read mode is2000-01-01 08:00:00
.For example, the time zone of MaxCompute is UTC+8. If the time in the TIMESTAMP WITH TIME ZONE column of a Hologres external table is
1969-01-01 00:00:00
, the time that is obtained in direct read mode is1969-01-01 08:00:01
.
The following table describes the data type mappings between Hologres and MaxCompute.
Data type supported by Hologres
Data type supported by MaxCompute
Description
TEXT
STRING
VARCHAR
N/A.
SMALLINT
SMALLINT
N/A.
INT
INT4
INTEGER
INT
N/A.
INT8
BIGINT
BIGINT
N/A.
FLOAT4
REAL
FLOAT
N/A.
FLOAT
FLOAT8
DOUBLE
N/A.
BOOL
BOOLEAN
BOOLEAN
N/A.
TIMESTAMP
TIMESTAMP
Timestamp is displayed in the unit of microsecond. Time error exists due to different time zones.
TIMESTAMP WITH TIME ZONE
TIMESTAMP
The precision and scale are automatically converted between MaxCompute and Hologres at the underlying layer. MaxCompute provides data of the TIMESTAMP type without a time zone specified.
NUMERIC
DECIMAL
If no precision or scale is specified for the DECIMAL data type in MaxCompute, the default value DECIMAL
(38,18)
is used. The precision and the scale are automatically converted when you create an external table by executing theIMPORT FOREIGN SCHEMA
statement.CHAR(n)
CHAR(n)
Entries of the
CHAR(n)
data type in MaxCompute are character strings with a fixed length that cannot exceedn
characters. The maximum value of n is 255. If you insert a character string that is shorter than the required length, Hologres adds spaces to the character string to increase the length.VARCHAR(n)
VARCHAR(n)
Entries of the
VARCHAR(n)
data type in MaxCompute are character strings with a variable length that cannot exceedn
characters. Valid values of n: 1 to 65535.DATE
DATE
N/A.
In foreign server mode, you must enable the three-layer model for your MaxCompute project.
Enable direct read
When you query data from a Hologres external table in MaxCompute, add the following command before the SQL statement:
set odps.table.api.enable.holo.table=true;
Verify direct read
You can view logs in LogView to check whether the direct read mode is used for queries. For more information about how to use LogView, see Use LogView V2.0 to view job information.
On the Summary tab of LogView, find the external holo tables
field to view the attribute. The attribute is displayed in the following format:
<project_name>.<table_name >:< Access mode>[<(Rollback reason)>]
The following table describes parameters in the preceding syntax.
Parameter | Description |
project_name | The name of the project. |
table_name | The name of the table for which you want to create the mapping table. |
Access mode | The mode that is used to access external tables. Valid values:
|
Rollback reason | If
|