All Products
Search
Document Center

Object Storage Service:Use Realtime Compute for Apache Flink to read data from or write data to OSS or OSS-HDFS

Last Updated:Apr 25, 2024

Realtime Compute for Apache Flink allows you to read data from and write data to OSS or OSS-HDFS. After you configure the input attributes of the OSS or OSS-HDFS connector, Realtime Compute for Apache Flink automatically reads data from the specified path and uses the data as the input stream. Then, Realtime Compute for Apache Flink writes the computing result in the specified format to the specified path in OSS or OSS-HDFS.

Prerequisites

  • Fully managed Flink is activated. For more information, see Activate Realtime Compute for Apache Flink.

    After you activate fully managed Flink, the Workspace that is created is displayed on the Fully Managed Flink tab within 5 to 10 minutes.

  • An SQL draft is created.

    When you create an SQL draft, you must select Realtime Compute for Apache Flink whose engine version is VVR 8.0.1 or later. For more information, see Procedure.

Limits

  • You can use Realtime Compute for Apache Flink to read data from or write data to only OSS buckets or OSS-HDFS within your Alibaba Cloud account.

  • You cannot write data in row-oriented storage formats, such as Avro, CSV, JSON, and Raw, to OSS or OSS-HDFS. For more information, visit FLINK-30635.

Procedure

  1. Go to the New Draft dialog box.

    1. Log on to the Realtime Compute for Apache Flink console.

    2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.

    3. In the left-side navigation pane, click SQL Editor.

  2. Write DDL and DML statements in the SQL editor.

    Write data from a source table in the dir path of the srcbucket bucket to a result table in the test path of the destbucket bucket.

    Note

    If you want to read data from and write data to OSS-HDFS by using the following code, make sure that OSS-HDFS is enabled for the srcbucket and destbucket buckets.

    CREATE TEMPORARY TABLE source_table (
     `file.name` STRING NOT NULL,
     `file.path` STRING NOT NULL METADATA
    ) WITH (
      'connector'='filesystem',
      'path'='oss://srcbucket/dir/',
      'format'='parquet'
    );
    
    CREATE TEMPORARY TABLE target_table(
     `name` STRING,
     `path` STRING 
    ) with (
      'connector'='filesystem',
      'path'='oss://destbucket/test/',
      'format'='parquet'
    );
    
    INSERT INTO target_table SELECT * FROM source_table ;

    For more information about the metadata columns supported by the source table, such as file.path and file.name, and how to use the WITH parameter, see OSS connector.

  3. Click Save.

  4. Click Validate.

    Check the SQL semantics of the draft, network connectivity, and the metadata information of the tables that are used by the draft. You can also click SQL Advice in the calculated results to view information about SQL risks and related optimization suggestions.

  5. Click Deploy.

    After the draft development and syntax check are complete, you can deploy the draft to publish the data to the production environment.

  6. Optional. Perform this step only if you need to read data from OSS-HDFS.

    Click the draft. In the Parameters section of the Configuration tab, configure the parameters, such as the OSS-HDFS AccessKey pair and endpoint, and then click Save.

    fs.oss.jindo.buckets: srcbucket;destbucket
    fs.oss.jindo.accessKeyId: LTAI********
    fs.oss.jindo.accessKeySecret: KZo1********
    fs.oss.jindo.endpoint: cn-hangzhou.oss-dls.aliyuncs.com

    The following table describes the parameters in the preceding code.

    Parameter

    Description

    fs.oss.jindo.buckets

    The name of the bucket in which the source table is stored and the name of the bucket in which the result table is stored. Bucket names are separated by a semicolon (;). Example: srcbucket;destbucket.

    fs.oss.jindo.accessKeyId

    The AccessKey ID of your Alibaba Cloud account or a RAM user. For more information about how to obtain the AccessKey ID, see View the information about AccessKey pairs of a RAM user.

    fs.oss.jindo.accessKeySecret

    The AccessKey secret of your Alibaba Cloud account or a RAM user. For more information about how to obtain the AccessKey secret, see View the information about AccessKey pairs of a RAM user.

    fs.oss.jindo.endpoint

    The endpoint that is used to access OSS-HDFS. Example: cn-hangzhou.oss-dls.aliyuncs.com.

  7. On the Deployments page, click Start in the Actions column and wait for the draft to enter the RUNNING state.

  8. View the written data in the specified storage path of the OSS or OSS-HDFS result table.

    If you write data to OSS, you can view the written data on the OSS Object tab of the Objects page in the OSS console. If you write data to OSS-HDFS, you can view the written data on the HDFS tab of the Objects page in the OSS console.