edit-icon download-icon

Rules configuration for ODPS data source

Last Updated: Apr 26, 2018

Rules configuration is the core function module of Data Quality. Data sources are divided into ODPS data source and DataHub data source. This article introduces how to configure ODPS data source.

Select a data source

Click Rules Configuration in the left-side navigation pane to enter the Rules configuration page. Select ODPS data source to display all the tables in the project you have joined.

Search a table

Input the target table name (fuzzy search is supported) to find the corresponding table.

Configure monitoring rules

Click Configure monitoring rules on the right side.

NOTE:

Additionally, you can select ODPS data source in My subscriptions, and click Partition expressions on the right to enter the Rules configuration page.

Configure partition expressions

Partition expressions are filter conditions used to satisfy verification rules.

Enter the Rules configuration page, click plus (+) in the upper left corner, and add the partition expression.

  • Add new partition expression: Click Add expression, the partition configuration window is displayed. Edit the partition expression according to your needs. Non-partition table can be directly selected (NOTAPARTITIONTABLE) in the recommended partition expressions list.
  • Format of the first-level partition expressions: Partition name = partition value. Partition value can be a fixed value or a built-in parameter expression. Format of the multi-level partition expressions: First-level partition name = partition value / second-level partition name = partition value / N-level partition name = partition value. Partition value can be a fixed value or a built-in parameter expression.

Built-in parameter expression

  • ${bizdate}

    The format is yyyymmdd. The previous day’s (year-month-day) scheduled time of the daily scheduled instance; and is equal to the time (year-month-day) for the instance of the automatically scheduled daily node, minus 1 day.

  • $[yyyymmddhh24miss]

    The format is yyyymmddhh24miss. It specifies the scheduled time (year-month-date-hour-minute-second) for the routinely scheduled instance.

Item Description
yyyy A four-digit year
mm A two-digit month
dd A two-digit day
hh24 The time in the 24-hour format
mi A two-digit time in minutes
ss A two-digit time in seconds

Get +/- period method

  • N days before: ${yyyymmdd-N}
  • The 1st day of each month: ${yyyymm01-1}
  • The 1st day of N months before: ${yyyymm01-Nm}
  • The last day of each month: dt=${yyyymmld-1}
  • The last day of N months before: dt=${yyyymmld-Nm}
  • Added partition expressions: Indicates the partition expressions already added to the table.

  • Recommended partition expressions: Indicates the partition expressions recommended by Data Quality. In the list of recommended partition expressions, you can find the partition expression that meets your requirements, and select to add it. After adding the partition expression, it is displayed in the Added partition expressions on the left.

  • Delete partition expression: Partition expressions that are no longer used can be deleted. If the partition expression has been configured with rules, all rules under the expression are also deleted.

NOTE:

In the following example, the partition name dt is taken as an example. If the table is a dynamic partition table, the use of a regular partition expressions is not recommended.

Partition expressions are as following:

Partition expression Description
ALL_PARTITIONS This partition expression can be selected for non-partition tables.
dt=<[a-zA-Z0-9_-]*> The expression is generally used for hours tasks. If the table partition is an hour partition, it automatically replaces the regular expression with the partition expression.
dt=${yyyymmdd-N} Indicates N days before.
dt=${yyyymm01-1} Indicates the 1st day of each month.
dt=${yyyymm01-Nm} Indicates the 1st day of N months before.
dt=${yyyymmld-1} Indicates the last day of each month.
dt=${yyyymmld-1m} Indicates the last day of N months before.

Click the input expression window, and the recommended partition expressions are displayed in the drop-down list.

  • If an appropriate expression is in the list, click the line to automatically synchronize it to the output window.

  • If none of partition expressions meet your requirements, you can input partition expressions as needed.

After the operation is complete, click Calculate. Data Quality calculates the value of partition expressions according to the current time (scheduled time) to verify the correctness of the partition expressions. Click Confirm to complete the operation.

Associated scheduling

To monitor offline data on the production links, you must use Data Quality associated scheduling feature. Сurrently, there are two associated portals: one in the Operation and maintenance center and the other on the Data Quality Rules configuration page.

You can add associated scheduling to the already created Task node. After associating with the schedule, the task run automatically (you can also select not to associate, and skip this step). Currently, associated scheduling supports fuzzy search, and you can find specific task nodes in the Operation and maintenance center.

NOTE:

You can enter Operation and maintenance center to set the associated scheduling quality monitoring configuration. The operation is as follows:

  1. Click More in the corresponding task tab, and select Quality monitoring configuration.

  2. Input the name of the table, and click Configuration in the corresponding partition expression tab (you can also add a partition expression by yourself) to configure this partition expression.

Create rules

Creating rules according the actual needs of the table is the core function module of Data Quality.

Currently, rules can be created in two ways: Template rules and Customized rules, specific usage depends on the actual needs. These two kinds of rules are divided into Add monitoring rules and Quick add.

After creating the rules, click Save batches, you can save all the rules to the already created partition expressions.

Template rules

  • Add monitoring rules

    • Field type: Consists of table-level rules and field-level rules. The field-level rules configure monitoring rules for specific fields in the table. The table-level rules are selected here, and other setting items in the interface correspond to the table-level rules configuration.
    • Intensity: You can configure the intensity of the rule. For example, when strong is selected, if the red threshold is triggered while the task is running, the task is set to fail.
    • Template type: The system has a built-in table-level monitoring rules module.

    • Tendency: Depending on the type of template selected, tendency can include the following types: absolute value, increasing, and decreasing.

    • Comparison of fluctuation values: Set the orange and red thresholds of the fluctuation value. You can manually drag the progress bar, or directly input the threshold value.

  • Quick add

    • Field name: Can be used only for field-level rules. Field-level rules configure monitoring rules for specific fields in the table. Select specific fields to set the field-level rules.

    • Rule type: Select the field null value or field repetition value.

If the template rules do not meet your requirements for partition expressions quality monitoring, you can use customized rules to create the custom monitoring rules.

Customized rules

On the Customized rules page, you can select to create table-level rules or custom SQL.

  • Add monitoring rules

    • Field type: Consists of table-level rules, field-level rules, and custom SQL. The table-level rules are selected here, and other settings items in the interface correspond to the table-level customized rules configuration.

    • Intensity: When strong is selected, if the red threshold is triggered while the task is running, the task is set to fail.

    • Statistical functions: Include two types: count and count/table_count.

    • Filter conditions: Custom SQL.

    • Verification method: The built-in verification method can be selected. The verification method defaults to a fixed value.

    • Tendency: Includes three types: absolute value, increasing, and decreasing. If the statistical function is set to count/table_count, the tendency defaults to a fixed value.

    • Comparison method: According to the actual needs, there are many options: greater than, greater than or equal to, equal to, not equal to, less than, less than or equal to.

  • Expected value: The expected target value.

    • Description: The detailed description of the customized rule.
  • Quick add

    • Rule type: Includes two types: Number of rows in the table is greater than null and Multiple fields repetition value.

    • Field name: When the rule type is Multiple fields repetition value, the field names that must be added are displayed, and the multiple field names can be added.

Test run

After the rules are configured, you can perform a test run for all the rules under a partition expression, and view the test run results.

  1. Select the required scheduling date, and click Test run.

    • Test run partition: the actual partition changes with the change of business date. If NOPARTITIONTABLE, the actual partition is automatically added.

    • Scheduling time: The default is the current time.

  2. Click Trial Run Success! Click to view the test run results, and go to the task query page to check the results.

Change the responsible person

When the responsible person leaves or changes job, person in charge of the partition expressions can be changed with another project member.

Place the mouse over the responsible person, and the hidden button is displayed. Click to modify the responsible person, input the name of the new person in charge, and click Confirm to submit.


More

Option More includes the following options: Partition operations logs, Last verification results, and Copy rules.

  • Partition operations logs: Displays a record of all the rule settings for the current partition expression.

  • Last verification results: Redirects to the the task query interface where you can view the running results under the current partition expression. You can also check the historical results.

  • Copy rules: You can copy the currently set rules into the target expression, and the transmissions can be synchronized.

For more information about template rules supported by ODPS data source, see Template rules

Thank you! We've received your feedback.