After you design a workflow, you need to configure each data analytics node in the workflow.

Prerequisites

A user-defined function (UDF) is registered. For more information, see Register a UDF.

Register a UDF

  1. Add resources.
    1. Download the GetAddr.jar package and ip.dat file for querying locations based on IP addresses.
      For more information about how to implement a UDF for querying locations based on IP addresses, see Implement IP address-based location query in MaxCompute.
    2. Go to the DataStudio page. On the Data Analytics tab, click the Workshop workflow and right-click MaxCompute. Choose Create > Resource > File to add a file resource, and choose Create > Resource > JAR to add a JAR resource.
      • When you add a file resource, upload the ip.dat file.
        1. In the Create Resource dialog box, set Resource Name, select Large File (over 500 KB) and Upload to MaxCompute, and then click Upload. Select the ip.dat file and click OK.
        2. On the tab that appears, click the Submit icon in the upper-left corner to commit the resource.
      • When you add a JAR resource, upload the GetAddr.jar package.
        1. In the Create Resource dialog box, set Resource Name, select Upload to MaxCompute, and then click Upload. Select the GetAddr.jar package and click OK.
        2. On the tab that appears, click the Submit icon in the upper-left corner to commit the resource.
          Note When you commit the resource, if a message appears and indicates inconsistent data lineage, ignore it.
  2. Register a UDF.
    1. Right-click MaxCompute under the Workshop workflow and choose Create > Function. In the Create Function dialog box that appears, set Function Name to getregion and click Commit.
    2. On the Register Function tab that appears, set Class Name to odps.test.GetAddr, Resources to getaddr.jar,ip.dat, and Expression Syntax to getregion(ip string). Click the Save icon in the upper-left corner to save the settings and then click the icon to commit the UDF for registration.

Configure nodes

  1. Configure the zero-load node named start.
    1. On the Data Analytics tab, double-click the Workshop workflow. In the directed acyclic graph (DAG) of the workflow, double-click the start node. The node configuration tab appears.
    2. Click the Properties tab. On the Properties tab that appears, click Use Root Node in the Dependencies section.
    3. In the Schedule section, set Rerun to Allow Regardless of Running Status.
    4. Click the icon to commit the node.
  2. Configure the ODPS SQL node named ods_user_trace_log.
    1. In the DAG of the workflow, double-click the ods_user_trace_log node. On the node configuration tab that appears, enter the following SQL statements in the SQL editor:
      insert overwrite table ods_user_trace_log partition (dt=${bdp.system.bizdate})
      select
          md5,
          uid ,
          ts,
          ip,
          status,
          bytes,
          device,
          system,
          customize_event,
          use_time,
          customize_event_content
          from ots_user_trace_log
          where to_char(FROM_UNIXTIME(ts),'yyyymmdd')=${bdp.system.bizdate};
      Note For more information about the ${bdp.system.bizdate} parameter, see Scheduling parameters.
    2. Click the Properties tab. On the Properties tab that appears, set Auto Parse to No in the Dependencies section.
    3. Remove incorrect parent nodes.
    4. Search for each correct parent node based on the workflow design and click the Add icon to add the parent node. In this example, the parent node is the start node.
    5. In the Schedule section, set Rerun to Allow Regardless of Running Status.
    6. Click the Submit icon.
  3. Configure the ODPS SQL node named dw_user_trace_log.
    You can configure the dw_user_trace_log node in the same way as configuring the ods_user_trace_log node. Enter the following SQL statements in the SQL editor:
    INSERT OVERWRITE TABLE dw_user_trace_log PARTITION (dt=${bdp.system.bizdate})
    SELECT uid, getregion(ip) AS region
        , CASE
            WHEN TOLOWER(device) RLIKE 'xiaomi' THEN 'xiaomi'
            WHEN TOLOWER(device) RLIKE 'meizu' THEN 'meizu'
            WHEN TOLOWER(device) RLIKE 'huawei' THEN 'huawei'
            WHEN TOLOWER(device) RLIKE 'iphone' THEN 'iphone'
            WHEN TOLOWER(device) RLIKE 'vivo' THEN 'vivo'
            WHEN TOLOWER(device) RLIKE 'honor' THEN 'honor'
            WHEN TOLOWER(device) RLIKE 'samsung' THEN 'samsung'
            WHEN TOLOWER(device) RLIKE 'leeco' THEN 'leeco'
            WHEN TOLOWER(device) RLIKE 'ipad' THEN 'ipad'
            ELSE 'unknown'
        END AS device_brand, device
        , CASE
            WHEN TOLOWER(system) RLIKE 'android' THEN 'android'
            WHEN TOLOWER(system) RLIKE 'ios' THEN 'ios'
            ELSE 'unknown'
        END AS system_type, customize_event, use_time, customize_event_content
    FROM ods_user_trace_log
    WHERE dt = ${bdp.system.bizdate};
  4. Configure the ODPS SQL node named rpt_user_trace_log.
    You can configure the rpt_user_trace_log node in the same way as configuring the ods_user_trace_log node. Enter the following SQL statements in the SQL editor:
    INSERT OVERWRITE TABLE rpt_user_trace_log PARTITION (dt=${bdp.system.bizdate})
    SELECT split_part(split_part(region, ',', 1),'[',2) AS country
        , trim(split_part(region, ',', 2)) AS province
        , trim(split_part(region, ',', 3)) AS city
        , MAX(device_brand), MAX(device)
        , MAX(system_type), MAX(customize_event)
        , FLOOR(AVG(use_time / 60))
        , MAX(customize_event_content), COUNT(uid) AS pv
        , COUNT(DISTINCT uid) AS uv
    FROM dw_user_trace_log
    WHERE dt = ${bdp.system.bizdate}
    GROUP BY uid, 
        region;
  5. Verify the configuration.
    On the Data Analytics tab, double-click the Workshop workflow. The workflow editing canvas appears on the right. Click the icon in the upper-left corner. If each node is marked with a green check sign (√), the nodes are run successfully.

    If a node fails to be run, right-click the node, select View Log, and then fix the issue based on logs.