All Products
Search
Document Center

:Data Preprocessing

Last Updated:Jan 25, 2024

This topic describes data processing operations such as data merging, cleansing, processing, aggregation, transposition, and association.

Prerequisites

Data input is configured. For more information, see Input Data.

Merge

Merge is used to merge two tables into one table. The merged data is expanded in rows.

The merge node is used to merge the fields with the same name in the data input 1 node and the data input 2 node.

  1. Drag Merge from the left-side node section to the canvas section. 探查5

  2. Connect the nodes to be merged. 数据探查1

  3. Configure the merge node.

    In this case, the fields in the merge node are the fields with the same name in data input 1 and data input 2. 数据探查4

  4. View data details.

    You can view the Data Exploration, Data Details, and Field List parameters. 加工1

Note
  • Data exploration, you can view the data in the table:

    1. Basic characteristics (type, format)

    2. Quality: normal value, null value | abnormal value

    3. The distribution, type, and format of values in a single field.

    4. The data relationship and value distribution law in the linkage mode.

For more information, see Data exploration.

  • Data Details: displays the merged columns and data of the two tables. All fields and data in the primary table (data input 1) will be displayed. Data corresponding to the same fields in the secondary table (data input 2) will be expanded on rows. Different fields and data will not be displayed.

  • Field List: displays the fields of the primary table whose data input is 1.

Cleaning and processing

Remove unnecessary columns and rows from the data table and add required columns and rows.

  1. Drag Cleaning and Processing from the left-side node section to the canvas section. 数据探查6

  2. Connect the nodes to be cleaned and processed. 数据探查8

  3. In the Cleaning and Processing Node Configuration section, configure the following parameters.

    • Add a calculated field

      • You can follow the steps shown in the following figure to go to the Add Calculated Field page. image

      • In the Add Field-Formula Function dialog box, set New Field Name and Field Expression, select Field Type, and then click OK. image

        For more examples of calculated fields, see Create a calculated field.

    • Add a window function

      • You can follow the steps shown in the following figure to go to the Add Window Function configuration page. image

      • In the Add Field-Window Function dialog box, set New Field Name and Field Expression, select Field Type, and then click OK to save the configuration.

        Note

        you only need to enter window_fuc(args) in the field expression. you do not need to enter the sql code after over. otherwise, an error is reported.

        imageIn the window configuration, configure the following parameters: image① Group field

        • By default, no group is selected.

          Note
          • No grouping indicates that all data is calculated as a group.

          • If the group field is set, the calculation will be performed within each group.

          • You can select multiple grouping fields.

        • You can also customize the grouping fields.

          ② Sorting

          By default, no sorting is selected. You can also select a field to customize sorting settings.

          Note
          • After sorting is set, the group is sorted based on the sorting field.

          • You can set a maximum of five sorting fields.

          • If the sorting function, cume_dist(), exists, the sorting field must be included.

          ③ Window range

          By default, no range is specified. You can select By Row or By Value. After the window range is selected, the data involved in the calculation will be calculated in the window range. If the range is not set, the data will be calculated in the entire group.

          • You can select the start element, current element, and pre-offset n for the Window Start parameter.

            • Partition start element: the first row /value data in the group.

            • Current element: Current row /value.

            • Pre-offset n: Offsets the current row /value forward by n rows /values.

          • You can select the current element, the end element of the partition, and the post offset n for the Window End parameter.

            • Current element: Current row /value.

            • End of partition element: the last row /value data of the group.

            • Post offset n: The current row is offset backward by n rows /value.

              Note
              • If the sorting function and cume_dist(), ntile(), lead(), and lag() exist, the window range does not support "select by row" and "select by value".

              • If DateTime, Time, and text type exist in the sort field, the "pre-offset n" and "post-offset n" selected by value are not supported.

              • If the window range is selected by value, there must be 1 sort field (there is and can only be 1).

  • Add Group Assignment

    • You can follow the steps shown in the following figure to go to the Add Group Assignment configuration page. image

    • In the Add Field-Group Assignment dialog box, set New Field Name, select Grouping Field, and then click Group Settings. Then, click OK. image

      After you save the configurations, a dimension field is added to the dataset. You can enter group names in this column based on the configurations. 分组1For more information about grouping dimensions, see Grouping dimensions.

  • Merge Field

    Note

    This operation is supported only when the field type is Text.

    合并

  • Filter 筛选

    • The filtering method supports enumeration filtering, conditional filtering, and field comparison. image

    • For more information about filter examples, see Sample display of compound queries.

  • Modify a field type

    In the Fields view, modify the field type. 修改字段类型

    Note

    When converting to a date field, the following three date types are supported:

    • Date: Date format only, for example, yyyy-MM-dd.

    • DateTime: Date + Time format; for example, yyyy-MM-dd HH:mm:ss.

    • Time: Only the time format, for example, HH:mm:ss.

    You can also customize the date format.

  • Value Replacement

    In the Fields view, modify the value of a field.

    Note

    This operation is supported only when the field type is Text.

    值替换

  • Split Field

    In the Fields view, split fields based on their values.

    Note

    This operation is supported only when the field type is Text.

    拆分

  • Rename Field

    Move the pointer over a field and click the 编辑icon (①) or double-click the field name (②) to rename the field. 重命名

  • Edit Field

    Note

    This operation is supported by Add Calculated Field, Add Window Function, Merge Field, and Split Field.

    1编辑

  • Copy Field

    You can copy a field whose field expression and field type are the same to create a field of the same type. This improves data preparation efficiency. 1复制

  • Delete Fields

    For redundant fields, the data is cleaned and processed. 1删除

  • View data details.

    You can view the following information: Data Exploration, Data Details, and Field List. 1清洗

Aggregates

Summarize or average data on the specified measures based on the selected dimensions.

  1. Drag Aggregate from the left-side node section to the canvas. jueh

  2. The connection requires an aggregation node. 数据探查9

  3. In the Aggregate Node Configuration section, follow the instructions in the following figure to configure grouping fields and summary fields. 数据探查10

  4. View data details.

    You can view the Data Exploration, Data Details, and Field List parameters. 1聚合

Transpose

Realize row-column transposition of data tables.

  1. From the node operation area on the left, drag and drop the node to the canvas area. image

  2. Select a transpose type and configure the transpose. You can select Row to Column or Column to Row. image

    1. Row to Column

      1. Drag a column to be transposed. The enumerated values of the column are automatically displayed. By default, all are selected. image

        Note

        You can add up to one column to be transposed.

      2. Drag the fill column. image

      3. The transpose effect.

        • Before transposition image

        • After transposition image

    2. Column to row

      1. Drag the column to be transposed. image

        Note

        You can add up to 10 columns to be transposed.

      2. The transpose effect.

        • Before transposition image

        • After transposition image

  3. View data details.

    You can view the Data Exploration, Data Details, and Field List parameters. image

Association

Associate two tables into a wide table and perform required data processing. The associated data is expanded on columns.

  1. Drag Associate from the left-side node section to the canvas section. 关联

  2. Connect the nodes to be associated. 2关联

  3. Follow the instructions in the following figure to configure the associated node. 3关联

    After you create an association, select the fields that you want to associate with the two tables. You can add multiple associated fields.

    • Four association types are provided: Inner Association, Left Association, Right Association, and Outer Association.

      • Inner join: When using inner join, the resulting table will contain values that match both tables.

      • Left associative: When you use left associative, the resulting table will contain all the values in the table on the left and the corresponding matches in the table on the right. If a value in the left table does not have a corresponding match in the right table, you will see a null value in the data view.

      • Right join: When you use right join, the resulting table will contain all the values in the right table and the corresponding matches in the left table. If a value in the right table does not have a corresponding match in the left table, you will see a null value in the data view.

      • External association: When you use full external association, the resulting table will contain all the values from both tables. If there is no match for a value in either table in the other table, you will see a null value in the data view.

    • If two related tables have the same field, the field will be renamed automatically.

  4. View data details.

    You can view data detection, data details, and field list. 4关联

After the data is processed, you can export the processed data. For more information, see Output data.