All Products
Search
Document Center

Dataphin:Create a real-time dataset through event preprocessing

Last Updated:Jan 21, 2025

Dataphin enables event preprocessing to support the creation of real-time datasets, using the processed results as dataset metrics. This guide will walk you through the steps to create and configure a real-time dataset using event preprocessing.

Prerequisites

  • Before creating a real-time dataset, ensure an event for real-time dataset development is created. For more information, see Add event.

  • Create the tag project to which the dataset will belong before creating a real-time dataset. For more information, see Create tag project.

    Important

    To create a new real-time dataset through event preprocessing, you must add a real-time computing source to the project.

Procedure

  1. Navigate to the Dataphin home page, click Tag on the top menu bar to access the Asset Market section by default.

  2. To enter the Add Real-time Dataset dialog box, follow these steps:

    Click Workbench -> select Tag Project -> click Real-time Dataset -> click Add Dataset.

  3. In the Add Real-time Dataset dialog box, select Event Preprocessing.

  4. On the Add Event Preprocessing configuration page, fill in the basic information for the dataset.

    Parameter

    Description

    Dataset Name

    Enter the dataset's name, which can include Chinese and English characters, numbers, and underscores (_), and must be within 64 characters.

    Dataset Code

    Provide a unique identifier for the real-time dataset, which may include Chinese and English characters, numbers, and underscores (_), and must be within 64 characters.

    Owner

    Select the owner of the real-time dataset.

    Description

    Provide a concise description of the real-time dataset, limited to 1000 characters.

  5. Set up the Processing Logic for the real-time dataset.

    Parameter

    Description

    Event List

    Choose the event to define for the dataset. For event creation details, see Add event.

    Primary Key

    Upon selecting the event, define the dataset's corresponding primary key.

    Note

    By default, the primary key can only be set for Character Type or Long Integer Type fields.

    Aggregation Attribute

    Select the fields to process, choose the appropriate query function and time window, and the system will automatically determine the return type.

    • Query functions vary based on field type:

      • Long Integer Type: Count, Sum, Max, Min.

      • String: Count, Max, Min.

    • Time window options include: Last 10 minutes, Last 30 minutes, Last 1 hour, Last 6 hours, Last 12 hours, Custom.

    To add multiple aggregation attributes, click the Add button.

    Filter Condition

    Apply filter conditions to the data as needed. Supported conditions include: Greater than or equal to, Greater than, Less than or equal to, Less than, Not empty, Empty, In range, Not in range, Or, And, Later than, Later than or equal to, Earlier than, Earlier than or equal to.

    If you need multiple filter conditions, you can click Add Filter Condition to add new ones. When there are multiple filter conditions, it supports Or, And two logical operations.

    • Or: Filters if any condition is met.

    • And: Filters only if all conditions are met.

  6. To finalize the creation, click Publish.

What to do next

Once you have created and published the real-time dataset, proceed to create corresponding real-time tags. For detailed instructions, see Real-time Tag Overview.