After you establish network connectivity between MaxCompute and Alibaba Cloud Lindorm, you can create MaxCompute external tables to associate with Lindorm tables. You can use these external tables to synchronize data from Lindorm to MaxCompute for extract, transform, and load (ETL) processing. You can also use Lindorm tables for federated computing or export data from MaxCompute to Lindorm. This topic describes how to create a Lindorm external table in MaxCompute and map it to a Lindorm data source.
Background information
Lindorm is a cloud-native, multi-model, hyper-converged database designed and optimized for scenarios such as the Internet of Things (IoT), the Internet, and the Internet of Vehicles. It supports unified access and integrated processing of various data types, including wide tables, time series, text, objects, streams, and spatial data. Lindorm is compatible with multiple standard interfaces, such as SQL, HBase, Cassandra, S3, TSDB, HDFS, Solr, and Kafka. It is a core database that supports key businesses at Alibaba. For more information about Lindorm, see Lindorm Product Overview.
Prerequisites
Before you create a Lindorm external table, make sure that the following prerequisites are met:
A MaxCompute project has been created to associate with the Lindorm table.
For more information about how to create a MaxCompute project, see Create a MaxCompute project.
The MaxCompute client is installed.
For more information about how to install the MaxCompute client, see Install and configure the MaxCompute client.
Network connectivity has been established between MaxCompute and Lindorm using a VPC connection solution.
For more information about VPC connection solutions, see Network connection process.
The Lindorm service has been activated and a Lindorm instance has been created. For more information, see Create an instance.
Lindorm Shell has been downloaded and configured. For more information, see Access LindormTable using Lindorm Shell.
Limits
You can create Lindorm external tables only in the following regions because only these regions support the solution for direct connection to a VPC over a leased line: China (Beijing), China (Shanghai), China (Zhangjiakou), China (Hangzhou), China (Shenzhen), China (Hong Kong), China (Shanghai) Finance Cloud Zone F, Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), and US (Virginia). Lindorm external tables are not supported in other regions.
When you establish network connectivity, MaxCompute connects only to the VPC that corresponds to the specified VPC ID. To access resources across regions or in other VPCs within the same region, you must establish network connectivity between the VPC specified in the leased line connection solution and the other VPCs. You can use an existing cloud VPC connection solution to accomplish this.
Lindorm external tables do not support the cluster property.
Notes
When you read data from a Lindorm source table, if the source table has fewer columns than the number of columns defined in the external table's Data Definition Language (DDL) statement, the missing column values are filled with NULL. If the source table has more columns, the data in the extra columns is discarded.
Syntax for creating a Lindorm external table
When you create a Lindorm external table, you must specify HBaseStorageHandler in the DDL statement and configure the Lindorm instance information. Table and field names are case-insensitive during table creation. Queries on tables and fields are also case-insensitive. Case conversion cannot be forced. The syntax is as follows:
--Enable Hive-compatible mode.
SET odps.sql.hive.compatible = true;
--Set the odps.sql.hbase.version parameter to switch the HBase version to lindorm.
SET odps.sql.hbase.version=lindorm;
--Create a Lindorm external table.
CREATE EXTERNAL TABLE if NOT EXISTS <mc_lindorm_external>
(
<col1_name> <data_type>,
<col2_name> <data_type>,
......
)
--The handler for the Lindorm data source.
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES(
'hbase.columns.mapping'='<:key,col-family:col-name1,col-family:col-name2,...>')
TBLPROPERTIES(
--hbase.table.name and mcfed.hbase.zookeeper.quorum are required.
'hbase.table.name'='<namespace:lindorm_table_name>',
'hbase.zookeeper.quorum'='<VPC ip1|host1:port,VPC ip2|host2:port...>',
--Configure the hbase.client.username and hbase.client.password parameters.
'hbase.client.username'='****','hbase.client.password'='****',
--hbase.zookeeper.property.clientPort, mcfed.zookeeper.session.timeout, and hbase.client.retries.number are optional parameters.
--The default value of hbase.zookeeper.property.clientPort is 30020.
['hbase.zookeeper.property.clientPort'='{30020}',]
['mcfed.zookeeper.session.timeout'='<value>',]
['hbase.client.retries.number'='<value>',]
'networklink'='<networklink_name>'
);Parameter descriptions
mc_lindorm_external: Required. The name of the Lindorm external table that you want to create.
col_name: Required. The name of a column in the Lindorm external table.
data_type: Required. The data type of the column.
with serdeproperties:
'hbase.columns.mapping'='<:key,col-family:col-name1,col-family:col-name2,...>': Required. The column mapping between the Lindorm external table and the table in the Lindorm instance. For more information, see HBaseIntegration-ColumnMapping.
The row key (
:key1[,:key2,:keyn]) must be placed at the beginning.The column format is
col-family:col-name1,col-family:col-name2. Only one column family is allowed per job.
tblproperties:
'hbase.table.name'='<namespace:lindorm_table_name>': Required. The namespace and name of the table in the Lindorm instance that is associated with MaxCompute. If you do not specify a namespace, the default namespace is used. You must create the table in the Lindorm console.
'hbase.zookeeper.quorum'='<VPC ip1|host1:port,VPC ip2|host2:port...>': Required. The IP address or domain name of the ZooKeeper service for Lindorm. This is the endpoint for accessing the Lindorm instance using the Java API.
hbase.client.username: Required. Log on to the Lindorm console. On the Instance List page, click the name of the destination instance. In the navigation pane on the left, click Database Connection. You can obtain the Default Username on the LindormTable tab.
hbase.client.password: Required. Log in to the Lindorm console, and on the Instance List page, click the name of the destination instance. In the navigation pane on the left, click Database Connection. The Default Initial Password is available on the LindormTable tab.
'hbase.zookeeper.property.clientPort'='{2181|30020}': Optional. The ZooKeeper port for Lindorm. The default ZooKeeper port for Lindorm is 30020.
'mcfed.zookeeper.session.timeout'='<value>': Optional. The ZooKeeper session timeout period. The unit is milliseconds. Set this parameter to 30000.
'hbase.client.retries.number'='<value>': Optional. The number of retries for connecting to the Lindorm console.
'networklink'='<networklink_name>': Required. The MaxCompute network link for the VPC where the Lindorm instance resides.
NoteParameters that start with mcfed.zookeeper and hbase are ZooKeeper-related configurations.
Usage examples
Use Lindorm Shell to create a Lindorm table and insert data.
Create the mf_lindorm_test_t table.
CREATE 'mf_lindorm_test_t','f';Insert data into the mf_lindorm_test_t table.
put 'mf_lindorm_test_t', '1122', 'f:col1', 'hello'; put 'mf_lindorm_test_t', '1122', 'f:col2', 'hbase';Query data in the mf_lindorm_test_t table.
scan 'mf_lindorm_test_t';The query result is shown in the following figure:

Create a MaxCompute external table and map it to the Lindorm data source.
Configure the environment context and create the MaxCompute Lindorm external table mf_lindorm_test_t_ext.
set odps.sql.hive.compatible = true; set odps.sql.hbase.version=lindorm; CREATE EXTERNAL TABLE mf_lindorm_test_t_ext ( key int, col1 string, col2 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'=':key,f:col1,f:col2') TBLPROPERTIES ( 'hbase.table.name'='mf_lindorm_test_t', 'hbase.zookeeper.quorum'='ld-2z****-proxy-lindorm.lindorm.rds.aliyuncs.com:30020', 'hbase.client.username'='root', 'hbase.client.password'='root', 'networklink'='networklink4hbase');Query data from the mf_lindorm_test_t_ext external table.
set odps.sql.hive.compatible = true; set odps.sql.hbase.version=lindorm; select * from mf_lindorm_test_t_ext;The query result is as follows:
+------------+------------+----------+ | key | col1 | col2 | +------------+------------+----------+ | 1122 | hello | hbase | +------------+------------+----------+Insert data into the mf_lindorm_test_t_ext external table.
Run the
INSERT OVERWRITEstatement to write data to the created external table.set odps.sql.hive.compatible = true; set odps.sql.hbase.version=lindorm; INSERT OVERWRITE TABLE mf_lindorm_test_t_ext SELECT '1123', 'lindorm', 'mftestinsert';Query the data in the mf_lindorm_test_t_ext external table again.
set odps.sql.hive.compatible = true; set odps.sql.hbase.version=lindorm; select * from mf_lindorm_test_t_ext;The query result is as follows:
+------------+------------+-------------+ | key | col1 | col2 | +------------+------------+-------------+ | 1122 | hello | hbase | | 1123 | lindorm | mftestinsert| +------------+------------+-------------+Use the
scanstatement to query data in the Lindorm table mf_lindorm_test_t.scan 'mf_lindorm_test_t';The query result is as follows:
