This topic describes how to access an Object Storage Service (OSS) data source by using the serverless Spark engine. After you are granted the permissions to access OSS, you can execute SQL statements or submit Spark code to access OSS.

Grant permissions to access OSS

Before you access OSS, make sure that you are authorized to access OSS by using DLA.

If you use an Alibaba Cloud account, you have the permissions to access all OSS data under your account and the OSS tables that are stored in DLA by default. You can directly access OSS without additional configurations.

If you use a RAM user to access OSS by submitting code, you must grant the RAM user the required permissions. For more information, see Grant permissions to a RAM user.

If you use the serverless Spark engine to access OSS tables stored in DLA, make sure that your RAM user is bound with a DLA child account. For more information, see Bind a DLA child account with a RAM user. In addition, make sure that the DLA child account has the required permissions. To check whether you have the required permissions, you can log on to the DLA console and execute SQL statements. These statements use the GRANT or REVOKE syntax that is compatible with the MySQL protocol.

Configure spark.dla.connectors

After you are granted the required permissions, you can use the serverless Spark engine to access OSS. Before you access OSS, you must set spark.dla.connectors to oss in the configuration file of your Spark job. This is because the access function of DLA does not take effect by default. You must use this parameter to make the access function take effect. If you do not want to use the access function, this parameter is not required. You can submit your JAR package and add the required configurations.

Execute SQL statements to access OSS data

The serverless Spark engine of DLA allows you to execute SQL statements to access OSS data in DLA. If you use this method, you do not need to submit code. For more information about how to execute SQL statements, see Spark SQL. Sample statements in a Spark job:
{
    "sqls": [
        "select * from `1k_tables`.`table0` limit 100",
        "insert into `1k_tables`.`table0` values(1, 'test')"
    ],
    "name": "sql oss test",
    "conf": {
        "spark.dla.connectors": "oss",
        "spark.driver.resourceSpec": "small",
        "spark.sql.hive.metastore.version": "dla",
        "spark.executor.instances": 10,
        "spark.dla.job.log.oss.uri": "oss://test/spark-logs",
        "spark.executor.resourceSpec": "small"
    }
}

Submit Spark code to access OSS data

You can submit Java, Scala, or Python code to access OSS data. Sample Scala code:
{  
  "args": ["oss://${oss-buck-name}/data/test/test.csv"],
  "name": "spark-oss-test",
  "file": "oss://${oss-buck-name}/jars/test/spark-examples-0.0.1-SNAPSHOT.jar",
  "className": "com.aliyun.spark.oss.SparkReadOss",
  "conf": {
    "spark.driver.resourceSpec": "medium",
    "spark.executor.resourceSpec": "medium",
    "spark.executor.instances": 2,
    "spark.dla.connectors": "oss"
  }
}
Note For the source code of SparkReadOss in the main class, see DLA Spark OSS demo.