All Products
Search
Document Center

Dataphin:Step 3: Introduce data

Last Updated:Jan 21, 2025

After planning your data warehouse and setting up data source information, you must introduce your source data, such as product, customer, and orders tables, into the project. This topic provides guidance on integrating data from data sources into your established project workspace.

Background information

The integration process for the product, customer, and orders tables is identical, differing only in the names of the pipelines. This topic demonstrates the integration process using the product table as an example.

Important

After integrating the product table, follow the instructions in this topic to integrate the customer and orders tables into the project.

Step 1: Create pipeline development script

  1. On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.

  2. In the top menu bar, select Project. If in Prod-Dev mode, also select Environment.

  3. In the left-side navigation pane, choose Integration > Batch Pipeline. On the right-side offline integration list, click the image icon and select Batch Pipeline.

  4. In the Create Offline Pipeline dialog box, enter the required parameters.

    Parameter

    Description

    Pipeline Name

    Enter Product Table Integration.

    Schedule Type

    Choose Recurring Task Node.

    Description (optional)

    Provide a brief description of the offline pipeline, if desired.

    Select Directory (optional)

    The default directory is Batch Pipeline.

  5. Click OK to finalize the creation of the offline pipeline.

    For details on offline pipeline parameter configuration, see Create Integration Task via Single Pipeline.

Step 2: Develop offline pipeline script

  1. On the offline single pipeline development page, click Component Library.

  2. In the Input component, select Mysql Input Component and drag it onto the pipeline canvas.

  3. In the Output component, select Maxcompute Output Component and drag it onto the pipeline canvas.

  4. Connect the Mysql Input Component to the Maxcompute Output Component.

    image.png

  5. Click the image.png icon on the input and output components to configure the Mysql Input Component and Maxcompute Output Component.

    • Mysql Input Component

      Parameter

      Description

      Step Name

      Retain the default value.

      Datasource

      Choose the data source you created in Step 2: dataphin_tutorial.

      Source Table Volume

      Select the Single Table option.

      Table

      Choose the source table named product.

      Shard Key (optional)

      A shard key is not required.

      Input Filter (optional)

      Filter conditions are not necessary.

      Output Fields

      Default output fields should be used.

    • Maxcompute Output Component

      Parameter

      Description

      Step Name

      Maintain the default name.

      Datasource

      Select the current project, Project > dataphin_tutorial(dataphin_tutorial).

      Table

      To create the target table:

      1. Click One-click Generate Target Table.

      2. In the code input box, retain the default table creation statement.

      3. Click Create.

      Loading Policy

      Choose Append Data as the loading policy.

      Mapping

      For the mapping relationship, select Same Name Mapping.

  6. Click OK to finalize the configuration of the input and output components.

    For details on the parameter configuration of input and output components, see Configure MySQL Input Component, Configure MaxCompute Output Component.

Step 3: Configure scheduling parameters for the pipeline script

  1. Click the Scan Configuration button on the menu bar of the current offline pipeline development canvas to access scheduling configurations.

  2. In the Schedule Dependency section, set the Upstream Dependency and leave other parameters at their default settings.

    In the Upstream Dependency section, click Add Root Node to establish it as the upstream dependency for the current task.

    For details on configuring offline integration task properties, refer to Configure Offline Pipeline Task Properties.

Step 4: Submit and publish the offline single pipeline script

  1. To submit the pipeline script, click the Submit icon in the menu bar of the current offline pipeline development canvas.

  2. Review the Submission Content and Pre-check information in the dialog box, and provide any necessary Submission Remarks.

  3. Click OK And Submit to proceed.

    When you submit a task, Dataphin conducts lineage analysis and performs submission checks. For more information, see Integration Task Submission Instructions.