Dataphin enables event preprocessing to support the creation of real-time datasets, using the processed results as dataset metrics. This guide will walk you through the steps to create and configure a real-time dataset using event preprocessing.
Prerequisites
Before creating a real-time dataset, ensure an event for real-time dataset development is created. For more information, see Add event.
Create the tag project to which the dataset will belong before creating a real-time dataset. For more information, see Create tag project.
ImportantTo create a new real-time dataset through event preprocessing, you must add a real-time computing source to the project.
Procedure
Navigate to the Dataphin home page, click Tag on the top menu bar to access the Asset Market section by default.
To enter the Add Real-time Dataset dialog box, follow these steps:
Click Workbench -> select Tag Project -> click Real-time Dataset -> click Add Dataset.
In the Add Real-time Dataset dialog box, select Event Preprocessing.
On the Add Event Preprocessing configuration page, fill in the basic information for the dataset.
Parameter
Description
Dataset Name
Enter the dataset's name, which can include Chinese and English characters, numbers, and underscores (_), and must be within 64 characters.
Dataset Code
Provide a unique identifier for the real-time dataset, which may include Chinese and English characters, numbers, and underscores (_), and must be within 64 characters.
Owner
Select the owner of the real-time dataset.
Description
Provide a concise description of the real-time dataset, limited to 1000 characters.
Set up the Processing Logic for the real-time dataset.
Parameter
Description
Event List
Choose the event to define for the dataset. For event creation details, see Add event.
Primary Key
Upon selecting the event, define the dataset's corresponding primary key.
NoteBy default, the primary key can only be set for Character Type or Long Integer Type fields.
Aggregation Attribute
Select the fields to process, choose the appropriate query function and time window, and the system will automatically determine the return type.
Query functions vary based on field type:
Long Integer Type: Count, Sum, Max, Min.
String: Count, Max, Min.
Time window options include: Last 10 minutes, Last 30 minutes, Last 1 hour, Last 6 hours, Last 12 hours, Custom.
To add multiple aggregation attributes, click the Add button.
Filter Condition
Apply filter conditions to the data as needed. Supported conditions include: Greater than or equal to, Greater than, Less than or equal to, Less than, Not empty, Empty, In range, Not in range, Or, And, Later than, Later than or equal to, Earlier than, Earlier than or equal to.
If you need multiple filter conditions, you can click Add Filter Condition to add new ones. When there are multiple filter conditions, it supports Or, And two logical operations.
Or: Filters if any condition is met.
And: Filters only if all conditions are met.
To finalize the creation, click Publish.
What to do next
Once you have created and published the real-time dataset, proceed to create corresponding real-time tags. For detailed instructions, see Real-time Tag Overview.