After planning your data warehouse and setting up data source information, you must introduce your source data, such as product, customer, and orders tables, into the project. This topic provides guidance on integrating data from data sources into your established project workspace.
Background information
The integration process for the product, customer, and orders tables is identical, differing only in the names of the pipelines. This topic demonstrates the integration process using the product table as an example.
After integrating the product table, follow the instructions in this topic to integrate the customer and orders tables into the project.
Step 1: Create pipeline development script
On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.
In the top menu bar, select Project. If in Prod-Dev mode, also select Environment.
In the left-side navigation pane, choose Integration > Batch Pipeline. On the right-side offline integration list, click the
icon and select Batch Pipeline.
In the Create Offline Pipeline dialog box, enter the required parameters.
Parameter
Description
Pipeline Name
Enter Product Table Integration.
Schedule Type
Choose Recurring Task Node.
Description (optional)
Provide a brief description of the offline pipeline, if desired.
Select Directory (optional)
The default directory is Batch Pipeline.
Click OK to finalize the creation of the offline pipeline.
For details on offline pipeline parameter configuration, see Create Integration Task via Single Pipeline.
Step 2: Develop offline pipeline script
On the offline single pipeline development page, click Component Library.
In the Input component, select Mysql Input Component and drag it onto the pipeline canvas.
In the Output component, select Maxcompute Output Component and drag it onto the pipeline canvas.
Connect the Mysql Input Component to the Maxcompute Output Component.
Click the
icon on the input and output components to configure the Mysql Input Component and Maxcompute Output Component.
Mysql Input Component
Parameter
Description
Step Name
Retain the default value.
Datasource
Choose the data source you created in Step 2: dataphin_tutorial.
Source Table Volume
Select the Single Table option.
Table
Choose the source table named product.
Shard Key (optional)
A shard key is not required.
Input Filter (optional)
Filter conditions are not necessary.
Output Fields
Default output fields should be used.
Maxcompute Output Component
Parameter
Description
Step Name
Maintain the default name.
Datasource
Select the current project, Project > dataphin_tutorial(dataphin_tutorial).
Table
To create the target table:
Click One-click Generate Target Table.
In the code input box, retain the default table creation statement.
Click Create.
Loading Policy
Choose Append Data as the loading policy.
Mapping
For the mapping relationship, select Same Name Mapping.
Click OK to finalize the configuration of the input and output components.
For details on the parameter configuration of input and output components, see Configure MySQL Input Component, Configure MaxCompute Output Component.
Step 3: Configure scheduling parameters for the pipeline script
Click the Scan Configuration button on the menu bar of the current offline pipeline development canvas to access scheduling configurations.
In the Schedule Dependency section, set the Upstream Dependency and leave other parameters at their default settings.
In the Upstream Dependency section, click Add Root Node to establish it as the upstream dependency for the current task.
For details on configuring offline integration task properties, refer to Configure Offline Pipeline Task Properties.
Step 4: Submit and publish the offline single pipeline script
To submit the pipeline script, click the Submit icon in the menu bar of the current offline pipeline development canvas.
Review the Submission Content and Pre-check information in the dialog box, and provide any necessary Submission Remarks.
Click OK And Submit to proceed.
When you submit a task, Dataphin conducts lineage analysis and performs submission checks. For more information, see Integration Task Submission Instructions.