Alibaba Cloud PAI Studio Python script component uses Quick Start

Step By Step
1. Drag and drop interface components and configure
2. Python component related configuration
3. Run tests
4. Log viewing


1. Drag-and-drop and configuration of interface components
•1.1 Designer console to create workflow

•1.2 Drag and drop the read data table component and the Python script component to the canvas respectively

2. Component related configuration
•2.1 DataWorks data open console to create data table and import data

script:
CREATE TABLE `lm_test_input_1` (
`value` bigint,
`output1` bigint
);

INSERT into table lm_test_input_1 values ( 1 , 2 );

INSERT into table lm_test_input_1 values ( 2 , 4 );

SELECT * FROM lm_test_input_1;
•2.2 Configure the read data table component

•2.3 Configuring Python script components

main.py

from odps import ODPS
from pai_running.context import Context

context = Context()

# Get the input data of the first input port of the component
input_port = context.input_artifacts.flatten()[ 0 ]
print( "---input_port---" ,input_port)
print( "---Log output test:---" )

o = ODPS(
access_id=context.access_key_id,
secret_access_key=context.access_key_secret,
endpoint=input_port.endpoint,
project=input_port.project,
)

# Get the table name entered from the upstream
input_table_name = input_port.table
print( "---input_table_name---" ,input_table_name)
# The table that the component is ready to output
output_table_name = "demo_output_table"

o.execute_sql(
f "drop table if exists {output_table_name};" ,
)

# Get the age column of the input table and export it to a new table
o.execute_sql(
f "create table {output_table_name} as select value from {input_table_name};"
)

# Through the following call, tell the Workflow framework that the current component outputs an ODPS table
output_port = context.output_artifacts.flatten()[ 0 ]
output_port.write_table(
table= "demo_output_table" ,
project=o.project,
endpoint=o.endpoint,
)
execute configure script
{
"_comments" : [
"The Python component runs user code in the DLC cluster of the public resource group (https://help.aliyun.com/document_detail/202277.html)" ,
"And support data loading/saving local file system, users can read and write upstream and downstream input and output data by reading and writing local files." ,
"The currently running task is configured through a JSON file. From a functional point of view, the configuration item mainly includes two parts" ,
"1. Data Load/Save Configuration" ,
"1.1. inputDataTunnel: an input port of each component corresponding to the component, which loads the input data (MaxComputeTable, OSS) of the upstream node into the local directory;" ,
"1.2. outputDataTunnel: an output port for each component corresponding to each item, specifying which local files to upload and save to OSS;" ,
"1.3. uploadConfig: OSS configuration for uploading data, including the uploaded OSS bucket name, endpoint, and the root path path to upload to OSS;" ,
"2. Running load configuration (jobConfig), including the specific running configuration running on ServerLess DLC;" ,
"Note: The following configuration items are a sample description, please modify and use according to the actual component running scenario." ,
"Note: The log output of the user's code execution can be viewed in the DLC console by clicking the DLC task URL output by the component"
],
"inputDataTunnel" : [
],
"outputDataTunnel" : [
],
"uploadConfig" : {
"endpoint" : "oss-.aliyuncs.com" ,
"bucket" : "" ,
"path" : "python_example/" ,
"_comments" : [
"Data upload configuration item, currently the data upload function only supports uploading to OSS." ,
"If there is no data upload configuration in a separate outputDataTunnel (no .uploadConfig field), the global uploadDataTunnelConfig configuration will be used" ,
"note: For each file/directory specified by DataTunnel, the final upload path is uploadConfig.path/{run_id}/{node_id}/{output_tunnel_name}/"
]
},
"jobConfig" : {
"name" : "example1" ,
"jobType" : "generalJob" ,
"taskSpec" : {
"instanceType" : "ecs.c6.large" ,
"imageUri" : "registry.cn-hangzhou.aliyuncs.com/paiflow-public/python3:v1.0.0"
},
"_comments" : [
"Quest configuration items for DLC, including" ,
"name: Prefix for the name of the task running in the DLC" ,
"jobType: job type, currently the default is GeneralJob, no need to modify, it means a single-node job (multi-node distributed jobs will be supported in the future)" ,
"taskSpec: task worker node configuration, where .instanceType represents the ECS instance type used by the worker; .imageUri is the image used by the worker" ,
"Currently, workers support the use of official images (https://help.aliyun.com/document_detail/202834.htm), as well as custom images. If you use custom images, please ensure that the images can support public access."
]
}
}


3. Run the test

4, log viewing

Since the Python script needs to rely on PAI-DLC as the underlying computing engine, it is actually run by creating a docker in the DLC cluster, so to view the detailed logs of the Python script, you need to view the DLC console.



Copyright statement: The content of this article is contributed by Alibaba Cloud's real-name registered users. The copyright belongs to the original author. The Alibaba Cloud developer community does not own the copyright and does not assume the corresponding legal responsibility. For specific rules, please refer to the " Alibaba Cloud Developer Community User Service Agreement " and " Alibaba Cloud Developer Community Intellectual Property Protection Guidelines ". If you find any content suspected of plagiarism in this community, fill out the infringement complaint form to report it. Once verified, this community will delete the allegedly infringing content immediately.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00