All Products
Search
Document Center

AnalyticDB for MySQL:Access OSS

Last Updated:Jan 08, 2024

AnalyticDB for MySQL Spark allows you to access Object Storage Service (OSS) data within an Alibaba Cloud account or across Alibaba Cloud accounts. This topic describes how to access OSS data within an Alibaba Cloud account or across Alibaba Cloud accounts.

Prerequisites

  • An AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster is created in the same region as an OSS bucket.

  • A job resource group is created in the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster. For more information, see Create a resource group.

  • A database account is created for the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster.

  • Authorization is complete. For more information, see Perform authorization for Alibaba Cloud accounts.

    Important

    To access OSS data within an Alibaba Cloud account, you must have the AliyunADBSparkProcessingDataRole permission. To access OSS data across Alibaba Cloud accounts, you must perform authorization for other Alibaba Cloud accounts.

Step 1: Prepare data

  1. Prepare a text file for access and upload the file to the OSS bucket. In this example, the file is named readme.txt. For more information, see Upload objects.

    AnalyticDB for MySQL
    Database service
  2. Compile Python code and upload the code to the OSS bucket. In this example, the Python code file is named example.py. The Python code file is used to read the first line in the readme.txt file.

    import sys
    
    from pyspark.sql import SparkSession
    
    # Initialize a Spark application.
    spark = SparkSession.builder.appName('OSS Example').getOrCreate()
    # Read the specified text file. The file path is specified by the args parameter.
    textFile = spark.sparkContext.textFile(sys.argv[1])
    # Count and display the number of lines in the text file.
    print("File total lines: " + str(textFile.count()))
    # Display the first line of the text file.
    print("First line is: " + textFile.first())
    

Step 2: Access OSS data

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, choose Job Development > Spark JAR Development.

  3. In the upper part of the editor, select the job resource group and a Spark application type. In this example, the Batch type is selected.

  4. Run the following Spark code in the editor. Display the total number of lines and the content of the first line in the text file.

    Access OSS data within an Alibaba Cloud account

    {
      "args": ["oss://testBucketName/data/readme.txt"],
      "name": "spark-oss-test",
      "file": "oss://testBucketName/data/example.py",
      "conf": {
        "spark.driver.resourceSpec": "small",
        "spark.executor.resourceSpec": "small",
        "spark.executor.instances": 1
      }
    }

    Access OSS data across Alibaba Cloud accounts

    {
      "args": ["oss://testBucketName/data/readme.txt"],
      "name": "CrossAccount",
      "file": "oss://testBucketName/data/example.py",
      "conf": {
        "spark.adb.roleArn": "acs:ram::testAccountID:role/<testUserName>",
        "spark.driver.resourceSpec": "c.medium",
        "spark.executor.instances": 1
        "spark.executor.resourceSpec": "c.medium",
      }
    }

    The following table describes the parameters.

    Parameter

    Description

    args

    The arguments that are passed to the Spark application. Separate multiple arguments with commas (,).

    In this example, the OSS path of the text file is assigned to textFile.

    name

    The name of the Spark application.

    file

    The path of the main file of the Spark application. The main file can be a JAR package that contains the entry point or an executable file that serves as the entry point for the Python application.

    Important

    You must store the main files of Spark applications in OSS.

    spark.adb.roleArn

    The RAM role that is used to access an external data source across Alibaba Cloud accounts. Separate multiple roles with commas (,). Specify the parameter in the acs:ram::<testAccountID>:role/<testUserName> format.

    Note
    • <testAccountID>: the ID of the Alibaba Cloud account that owns the external data source.

    • <testUserName>: the name of the RAM role that is created when you perform authorization across Alibaba Cloud accounts. For more information, see the "Perform authorization across Alibaba Cloud accounts" section of the Perform authorization for Alibaba Cloud accounts topic.

    conf

    The configuration parameters that are required for the Spark application, which are similar to the configuration parameters of Apache Spark. The parameters must be in the key: value format. Separate multiple parameters with commas (,). For information about the configuration parameters that are different from the configuration parameters of Apache Spark or the configuration parameters that are specific to AnalyticDB for MySQL, see Spark application configuration parameters.

  5. Click Run Now.

    After you run the Spark code, you can click Log in the Actions column on the Applications tab of the Spark JAR Development page to view log information. For more information, see Spark editor.

References