The Spark engine, Data Lake Analytics (DLA) SQL engine, and data lake formation service share metadata.
Access the DLA metadata service by using the Spark engine

Execute
in the left-side navigation pane to view all the database and table schemas of data lakes, perform online analysis and query on the tables, and manage the permissions of RAM users on the databases and tables. The following figure shows the Execute page.
Submit a Spark SQL job
"spark.sql.hive.metastore.version": "dla"
to the code when you submit a Spark SQL job. The following example demonstrates how to extract the content of the table table0
under the database 1k_tables
in the data lake metadata:{
"sqls": [
"select * from `1k_tables`.`table0` limit 100",
"insert into `1k_tables`.`table0` values(1, 'test')"
],
"name": "sql test",
"jars": [
"oss://test/hive-serde-3.1.2.jar"
],
"conf": {
"spark.dla.connectors": "oss",
"spark.driver.resourceSpec": "small",
"spark.sql.hive.metastore.version": "dla",
"spark.executor.instances": 10,
"spark.dla.job.log.oss.uri": "oss://test/spark-logs",
"spark.executor.resourceSpec": "small"
}
}
The "sqls" keyword allows you to submit SQL jobs without the need to submit JAR packages. This makes the operation easier and is suitable for developers who are familiar with SQL. The parameter value is an array, which allows you to execute multiple SQL statements in a job. In this example, multiple SQL statements are separated by commas (,).
You can submit Spark SQL jobs in the DLA console or by using APIs. The method of submitting Spark SQL jobs by using APIs is the same as the method of submitting other types of Spark jobs.
You can also submit Spark SQL jobs by using the Spark-SQL toolkit. For more information, see
Use Spark SQL in the code
from pyspark.sql import SparkSession
if __name__ == "__main__":
# init pyspark context
spark = SparkSession \
.builder \
.appName("Python SQL Test") \
.getOrCreate()
# create a database
spark.sql(
"create database if not exists dlatest comment 'c' location 'oss://test/warehouse/' WITH DBPROPERTIES(k1='v1', k2='v2')")
# create table
spark.sql(
"create table dlatest.tp(col1 INT) PARTITIONED BY (p1 STRING, p2 STRING) location 'oss://test/warehouse/tp' STORED AS parquet TBLPROPERTIES ('parquet.compress'='SNAPPY')")
# show structure
print(spark.sql("show create table tp").collect()[0])
# insert data
spark.sql("INSERT into tp partition(p1='a',p2='a') values(1)")
# show data
spark.sql("select * from tp").show()
{
"name": "DLA SQL Test",
"file": "oss://path/to/example.py"
"conf": {
"spark.driver.resourceSpec": "small",
"spark.sql.hive.metastore.version": "dla",
"spark.dla.connectors": "oss",
"spark.executor.instances": 1,
"spark.dla.job.log.oss.uri": "oss://path/to/spark-logs",
"spark.executor.resourceSpec": "small"
}
}
dlatest
database and its tp
table on the Execute
page in the DLA console.The name of the DLA metadata service is not case-sensitive. Case sensitivity is ignored when this service references database and table names.
Precautions for reading and writing tables stored in Hive formats
We recommend that you download the JAR package of Hive Serde from the official Maven repository, such as hive-serde.
{
"name": "DLA Meta Test",
"sqls": ["SELECT * FROM HiveDB.JsonTable"],
"jars": [
"oss://test/hive-serde-3.1.2.jar"
],
"conf": {
"spark.driver.resourceSpec": "small",
"spark.sql.hive.metastore.version": "dla",
"spark.dla.connectors": "oss",
"spark.executor.instances": 1,
"spark.dla.job.log.oss.uri": "oss://test/spark-logs",
"spark.executor.resourceSpec": "small"
}
}
Limits and precautions
LOCATION
for the database. The following SQL statement is an example:CREATE DATABASE db1 LOCATION 'oss://test/db1/'
LOCATION
for the table. The following SQL statement is an example:CREATE TABLE table1(col1 INT) LOCATION 'oss://test/db1/table1/'
When you execute the DROP statement to delete a table or a partition of a table in the Spark engine, the files are not deleted from the OSS bucket where the table is stored.
The
LOCATION
parameter that you specified when you create a table must be the subfolder of theLOCATION
parameter where the database is stored.The
PARTITION LOCATION
parameter that you specified when you add a partition for the table must be the subfolder of theLOCATION
parameter where the table is stored.The
RENAME PARTITION
statement does not change the directory structure of OSS to which the partition belongs.
2. The Spark engine supports only external tables that are stored in OSS.
The Spark engine allows you to use the data lake metadata service to read and write external tables that are stored in OSS.LOCATION
.DEFAULT
.
Do not create or perform operations on a database named
DEFAULT
in the Spark engine.Before you perform operations on the table, you must execute the
USE DatabaseName
statement to switch to the required database. You can also explicitly specify the database to which a table belongs, such asSELELCT * FROM db1.table1
.
ALTER
statements.ALTER DATABASE
. The Spark engine has the following limits on ALTER statements:The
ALTER
statements can only be executed to modify the comment of a database. They cannot be executed to modify the location or properties of a database.The
ALTER
statements can only be executed to modify the columns or properties of a table. For example, you can add a column or modify properties.Note that the column cannot be a partition key column.
CREATE TEMPORARY FUNCTION IF NOT EXISTS func1 as
'com.aliyun.dla.SimpleUDFExample' USING JAR '/path/to/jar1',
JAR '/path/to/jar2'
Execute
page in the DLA console.DROP DATABASE IF EXISTS db1
GRANT
statements.