×
Community Blog ProjectName Automatically Completed under ODPS SQL Development and Production Environment of DataWorks

ProjectName Automatically Completed under ODPS SQL Development and Production Environment of DataWorks

This article discusses a specific scenario where you can use the DataWorks process parameters feature without frequently modifying the project name.

By Jianwei Liu

1. Scenario Description

The development environment is isolated from the production environment in the standard mode of DataWorks. Thus, the naming formats of database tables differ in the two environments. If you need to access the database tables of the production environment from the development environment or access tables of project space B from project space A, you are required to distinguish the names of database tables to prevent inappropriate operations. The following table describes the naming formats of database tables for the two environments.

1

For SQL tasks in the development environment, [Development environment.table name] projectA_dev.user_info is required to use a table. You need to manually change the development environment name to the production environment name to publish the projectA.user_info to the production environment.

2. Plan Implementation

2.1 Solution 1: Use Assignment Nodes

2.1.1 Solution Idea

Use the Python SDK get_project method to get the project name. Then, use assignment nodes to provide the task results of the upstream nodes to the downstream nodes for use.

2.1.2 Problems of Solution 1

Assignment nodes only support three assignment languages: ODPS SQL, SHELL, and Python2. You need to download the ODPS package to use Python2. In addition, it is required to specify AK and ProjectName to connect to ODPS in the code. Although this solution can obtain upstream results and assign the values to downstream nodes, it cannot see the space name by flexible switching to satisfy user scenarios.

Then, if we consider using a general node PyODPS of DataWorks to manually add upstream output parameters, is it feasible to use assignment nodes or parameter nodes for downstream?

The solution also has problems in the time test.

The output part of the Context parameter for an upstream node must be specified in the input part of the Context parameter for a downstream node, so you cannot directly pass the query results of the upstream nodes to the downstream nodes. If you want to pass the query results of the upstream nodes to the downstream nodes, you can use the assignment nodes.

2.2 Solution 2: Use Workflow Parameters

If you need to assign a value to a variable or replace the value of a parameter for all nodes in a workflow, you can adopt workflow parameters. If the value assigned to a workflow parameter is inconsistent with the value assigned to a single node parameter, the value of the workflow parameter overwrites the value of the node parameter.

2.2.1 Configure Node Parameters

This article takes the cross-project space as an example. Access the table data of the project space bigdatazjpoc in the project space bigdtata_ljw_test.

  • Set parameters in scheduling configuration parameters

projectname=bigdatazjpoc_dev

  • Reference parameter configuration in code
-- Reference in code: The current project name is bigdtata_ljw_test
select * from ${projectname}.user_info_delta;

The following figure shows the configuration:

2

2.2.2 Configure a Workflow Parameter

Parameter name: projectname

Parameter value or expression: bigdatazjpoc

⚠️ Note: The workflow parameter name must be the same as the node name.

3

3. Verification of Solutions

Parameters are set in nodes. In the development environment, single-node testing requires smoke testing or advanced running. During the advanced running, the parameter is bigdatazjpoc_dev and the cross-project development environment. It meets the needs of the development environment to access the development environment.

4

In the development environment, the entire workflow test is manually entering the parameter bigdatazjpoc_dev. The test running also accesses table data in the development environment.

5

Submitting to the production environment for running retroactive data, you can see the project in the production environment where the execution parameter is configured for the workflow parameter. (Ignore the run failure in the screenshot caused by permission issues.)

If the value assigned to a workflow parameter is inconsistent with the value assigned to a node parameter, the value of the workflow parameter overwrites the value of the node parameter. The workflow parameter is configured as the project space in the production environment corresponding to the cross-project space. This scenario is satisfied.

6

4. Summary

If you need to access table data across projects or with isolated connections between the development and production environments, you can use the DataWorks process parameters feature without frequently modifying the project name.

0 0 0
Share on

Alibaba Cloud MaxCompute

116 posts | 16 followers

You may also like

Comments