This topic describes how to use Spark SQL to read and write Iceberg external tables in AnalyticDB for MySQL.
Prerequisites
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster is created.
The amount of reserved resources in the Enterprise Edition or Basic Edition cluster is greater than 0 AnalyticDB compute units (ACUs).
The amount of reserved storage resources in the Data Lakehouse Edition cluster is greater than 0 ACUs.
The kernel version of the cluster is 3.2.5.0 or later.
NoteTo view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.
A job resource group or an interactive resource group with the Spark engine is created for the AnalyticDB for MySQL cluster.
A database account is created for the AnalyticDB for MySQL cluster.
If you use an Alibaba Cloud account, you need to only create a privileged account.
If you use a Resource Access Management (RAM) user, you must create a privileged account and a standard account and associate the standard account with the RAM user.
An Object Storage Service (OSS) bucket is created in the same region as the AnalyticDB for MySQL cluster.
Data Lake Storage is activated.
NoteThis condition is required only when you read and write Data Lake Storage tables.
Step 1: Go to the SQL Development page
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose .
On the SQLConsole tab, select the Spark engine and a resource group (a job resource group or an interactive resource group with the Spark engine).
Step 2: Create an external database and an Iceberg external table
You can execute the following SQL statements in batch or interactive mode.
Data Lake Storage tables
Create a database.
CREATE DATABASE adb_external_db_iceberg WITH DBPROPERTIES ('adb_lake_bucket' = 'adb-lake-cn-shanghai-6gml****');The following table describes the parameter:
Parameter
Description
adb_lake_bucket
Specifies the storage location of the Data Lake Storage table data.
If you specify Data Lake Storage in the database creation statement, all tables in the database are stored in the Data Lake Storage. If you do not want to store all tables in the database in Data Lake Storage, you can set this parameter when you create a table.
Create an Iceberg external table.
SET spark.adb.lakehouse.enabled=true; ----Enable Data Lake Storage CREATE TABLE adb_external_db_iceberg.test_iceberg_tbl ( `id` int, `name` string, `age` int ) USING iceberg PARTITIONED BY (age) TBLPROPERTIES ( 'adb_lake_bucket' = 'adb-lake-cn-shanghai-6gml****' );The following table describes the parameter:
Parameter
Description
adb_lake_bucket
Specifies the storage location of the Data Lake Storage table data.
If you have specified Data Lake Storage when you create the database, all tables in the database are stored in the Data Lake Storage. You do not need to specify this parameter again when you create a table.
If you did not specify Data Lake Storage when you create the database, you must explicitly specify this parameter when you create a table. Otherwise, an error occurs. After you specify this parameter, the data of the table is stored in the specified Data Lake Storage.
If you explicitly specify Data Lake Storage when you create both the database and the table, the data of the table is stored in the Data Lake Storage specified when you create the table. Other tables in the database are stored in the Data Lake Storage specified when you create the database.
Non-Data Lake Storage tables
Check whether you can create an Iceberg external table in an existing database, or create a new database.
Execute the
SHOW CREATE DATABASEstatement to view the DDL statement of an existing database. If one of the following conditions is met, you can use the existing database. Otherwise, you must create a new database.No
Locationparameter is specified in the DDL statement.The
Locationparameter is specified in the DDL statement and the value of theCatalogparameter ismix.
Create a new database.
CREATE DATABASE adb_external_db_iceberg;
Create an Iceberg external table.
ImportantIf the
Locationparameter is specified for the database, you must make sure that the OSS path prefix specified by thespark.iceberg.warehouseparameter for creating, reading, and writing the Iceberg external table matches theLocationparameter.SET spark.adb.version=3.5; --Specify the Spark version, which must be 3.5 SET spark.iceberg.warehouse=oss://testBucketName/iceberg/; --The storage path of the metadata and data files of the Iceberg external table CREATE TABLE adb_external_db_iceberg.test_iceberg_tbl ( `id` int, `name` string, `age` int ) USING iceberg PARTITIONED BY (age);
Step 3: Write data to or delete data from the Iceberg external table
Data Lake Storage tables
Write data
When you perform write operations, you must add the following parameter before the SQL statements. Otherwise, the write operations fail.
SET spark.adb.lakehouse.enabled=true; ----Enable Data Lake StorageExecute the INSERT INTO statement to write data.
INSERT INTO adb_external_db_iceberg.test_iceberg_tbl VALUES (1, 'lisa', 10), (2, 'jams', 20);Execute the INSERT OVERWRITE statement to write data.
INSERT OVERWRITE adb_external_db_iceberg.test_iceberg_tbl VALUES (1, 'lisa', 10), (2, 'jams', 30);Execute the INSERT OVERWRITE statement to write data to a static partition.
INSERT OVERWRITE adb_external_db_iceberg.test_iceberg_tbl PARTITION(age=10) VALUES (1, 'anna');Execute the INSERT OVERWRITE statement to write data to a dynamic partition.
SET spark.sql.sources.partitionOverwriteMode=dynamic; --Only overwrite dynamic partitions. If not configured, all data in the table will be overwritten INSERT OVERWRITE adb_external_db_iceberg.test_iceberg_tbl PARTITION(age) VALUES (1, 'bom', 10);Execute the UPDATE statement to write data.
UPDATE adb_external_db_iceberg.test_iceberg_tbl SET name = 'box' WHERE id = 2;
Delete data
SET spark.adb.lakehouse.enabled=true; ----Enable Data Lake Storage
DELETE FROM adb_external_db_iceberg.test_iceberg_tbl WHERE id = 1;
DELETE FROM adb_external_db_iceberg.test_iceberg_tbl WHERE age = 20;Non-Data Lake Storage tables
When you perform write and delete operations, you must add the following parameters before the SQL statements. Otherwise, the write and delete operations fail.
SET spark.adb.version=3.5; --Specify the Spark version, which must be 3.5
SET spark.iceberg.warehouse=oss://testBucketName/iceberg/; --The storage path of the metadata and data files of the Iceberg external tableWrite data
Execute the INSERT INTO statement to write data.
INSERT INTO adb_external_db_iceberg.test_iceberg_tbl VALUES (1, 'Frank', 10), (2, 'Amy', 10);Execute the INSERT OVERWRITE statement to write data.
INSERT OVERWRITE adb_external_db_iceberg.test_iceberg_tbl VALUES (1, 'Frank', 10), (2, 'Amy', 20);INSERT OVERWRITE static partition writing
INSERT OVERWRITE adb_external_db_iceberg.test_iceberg_tbl PARTITION(age=10) VALUES (1, 'Frank');Execute the INSERT OVERWRITE statement to write data to a dynamic partition.
SET spark.sql.sources.partitionOverwriteMode=dynamic; --Only overwrite dynamic partitions. If not configured, all data in the table will be overwritten INSERT OVERWRITE adb_external_db_iceberg.test_iceberg_tbl PARTITION(age) VALUES (1, 'Bom', 10);Execute the UPDATE statement to write data.
UPDATE adb_external_db_iceberg.test_iceberg_tbl SET name = 'box' WHERE id = 2;
Delete data
SET spark.adb.version=3.5; --Specify the Spark version, which must be 3.5
SET spark.iceberg.warehouse=oss://testBucketName/iceberg/; --The storage path of the metadata and data files of the Iceberg external table
DELETE FROM adb_external_db_iceberg.test_iceberg_tbl WHERE id = 1;
DELETE FROM adb_external_db_iceberg.test_iceberg_tbl WHERE age = 20;Step 4: Query data from the Iceberg external table
Data Lake Storage tables
SET spark.adb.lakehouse.enabled=true; ----Enable Data Lake Storage
SELECT * FROM adb_external_db_iceberg.test_iceberg_tbl;Return values:
+---+----+---+
|id |name|age|
+---+----+---+
|1 |anna|10 |
|2 |jams|20 |
+---+----+---+Non-Data Lake Storage tables
SET spark.adb.version=3.5; --Specify the Spark version, which must be 3.5
SET spark.iceberg.warehouse=oss://testBucketName/iceberg/; --The storage path of the metadata and data files of the Iceberg external table
SELECT * FROM adb_external_db_iceberg.test_iceberg_tbl;Return values:
+---+----+---+
|id |name|age|
+---+----+---+
|1 |anna|10 |
|2 |jams|20 |
+---+----+---+Step 5: Delete the Iceberg external table
Data Lake Storage tables
Delete the Iceberg managed Data Lake Storage table metadata and data from AnalyticDB for MySQL:
SET spark.adb.lakehouse.enabled=true;
DROP TABLE adb_external_db_iceberg.test_iceberg_tbl;Non-Data Lake Storage tables
Delete the Iceberg external table from AnalyticDB for MySQL while retaining the Iceberg external table data in OSS:
DROP TABLE adb_external_db_iceberg.test_iceberg_tbl;Delete the Iceberg external table from AnalyticDB for MySQL and the Iceberg external table data from OSS:
DROP TABLE adb_external_db_iceberg.test_iceberg_tbl purge;