Use Realtime Compute for Apache Flink to read from or write to OSS or OSS-HDFS - Object Storage Service

Realtime Compute for Apache Flink lets you read data from and write data to Object Storage Service (OSS) and Hadoop Distributed File System (HDFS) deployed on OSS (OSS-HDFS). After you configure the properties of the OSS or OSS-HDFS connector, Realtime Compute for Apache Flink automatically reads data from the specified path to use as an input stream. Then, Realtime Compute for Apache Flink writes the computation results in the specified format to the specified path in OSS or OSS-HDFS.

Prerequisites

Fully managed Flink must be activated. For more information, see Activate Realtime Compute for Apache Flink.
After you activate fully managed Flink, the created workspace appears on the Fully Managed Flink tab within 5 to 10 minutes.
The SQL job was created.
When you create the SQL job, select a Flink compute engine of Ververica Runtime (VVR) 8.0.1 or later. For more information, see Create a job.

Limits

You can read data from or write data to only OSS or OSS-HDFS services that are in the same Alibaba Cloud account.
When you write data to OSS, you cannot write data in row store formats, such as Avro, CSV, JSON, and Raw. For more information, see FLINK-30635.

Procedure

Navigate to the SQL draft creation page.
1. Log on to the Realtime Compute for Apache Flink management console.
2. Find the target workspace and click Console in the Actions column.
  The development console appears.
3. In the left navigation menu, choose Development > ETL.
In the SQL editor, write the Data Definition Language (DDL) and Data Manipulation Language (DML) code.
This example writes data from a source table in the `dir` path of the `srcbucket` bucket to a sink table in the `test` path of the `destbucket` bucket.
Note
If you want to use the following code to read data from OSS-HDFS, make sure that the OSS-HDFS service is enabled for the srcbucket and destbucket buckets.
```
CREATE TEMPORARY TABLE source_table (
 `file.name` STRING NOT NULL,
 `file.path` STRING NOT NULL METADATA
) WITH (
  'connector'='filesystem',
  'path'='oss://srcbucket/dir/',
  'format'='parquet'
);

CREATE TEMPORARY TABLE target_table(
 `name` STRING,
 `path` STRING 
) with (
  'connector'='filesystem',
  'path'='oss://destbucket/test/',
  'format'='parquet'
);

INSERT INTO target_table SELECT * FROM source_table ;
```
For more information about the metadata columns supported by the source table, such as file.path and file.name, and the use of WITH parameters, see Object Storage Service (OSS) connector.
Click Save.
Click Advanced Check.
The advanced check feature examines the SQL semantics of the job, network connectivity, and the metadata of the tables used by the job. You can also click SQL Optimization in the results area to view potential SQL threats and the corresponding optimization suggestions.
Click Deploy.
After you develop the job and complete the advanced check, you can deploy the job to the production environment.

(Optional) This step is required only if you read data from an OSS-HDFS service.

Click the job. On the Deployment Details tab, in the Running Parameter Configuration section, configure the AccessKey pair, Endpoint, and other information for the OSS-HDFS service as described below. Then, click Save.

fs.oss.jindo.buckets: srcbucket;destbucket
fs.oss.jindo.accessKeyId: LTAI**************** 
fs.oss.jindo.accessKeySecret: yourAccessKeySecret
fs.oss.jindo.endpoint: cn-hangzhou.oss-dls.aliyuncs.com

The following table describes the configuration items.

Configuration Item	Description
fs.oss.jindo.buckets	The name of the bucket where the source table data is located and the name of the bucket where the sink table data will be written. Separate bucket names with a semicolon (;). Example: `srcbucket;destbucket`.
fs.oss.jindo.accessKeyId	The AccessKey ID of your Alibaba Cloud account or a Resource Access Management (RAM) user. For information about how to obtain an AccessKey ID, see View the AccessKey information of a RAM user.
fs.oss.jindo.accessKeySecret	Use an existing AccessKey or create a new one. For more information, see Create an AccessKey pair. Note: To reduce the risk of an AccessKey secret leak, the AccessKey secret is displayed only when you create it. You cannot view it later. Make sure to store it securely.
fs.oss.jindo.endpoint	The endpoint of the OSS-HDFS service. Example: cn-hangzhou.oss-dls.aliyuncs.com.

On the Job O&M page, click Start, and wait for the job to enter the Running state.
View the data written to the specified storage path of the OSS or OSS-HDFS sink table.
If data is written to OSS, you can view it on the OSS tab of the file list in the OSS console. If data is written to OSS-HDFS, you can view it on the HDFS tab of the file list in the OSS console.