All Products
Search
Document Center

Dataphin:Data exploration and analysis

Last Updated:Jun 23, 2026

Explore data before synchronizing it to Dataphin to understand data distribution, null values, and other characteristics, enabling more standardized data usage. You can configure data exploration for both compute source tables and data source tables.

Prerequisites

You must purchase Data Quality to use the data exploration feature.

Limits

Only tables of specific data source types support data exploration. For supported data sources, see Supported partition exploration and exploration scope for different data sources.

Compute source tables are not supported when the compute engine is AnalyticDB for PostgreSQL, ArgoDB, or StarRocks.

Permission description

Super administrators, operation administrators, and custom global roles with Exploration And Analysis-Data Exploration Configuration permissions can configure data exploration.

Data exploration configuration

  1. In the top menu bar of the Dataphin home page, select Administration > Metadata.

  2. In the navigation pane on the left, select General Configuration > Exploration And Analysis. On the Data Exploration And Analysis page, you can configure data exploration separately for compute source tables and data source tables.

    Basic configuration

    Set the record retention policy for all data source types.

    1. Click the Edit button at the bottom and configure the parameters.

      Profiling Record: Two options are available:

      • Only Retain The Latest Exploration Record And Report:

        • If the latest run is successful and generates a report, all previous records, both successful and failed, will be deleted.

        • If the latest run fails, only the failed record and the most recent successful report will be kept, while other failed records are deleted. If no successful records exist, only the current failed record is retained.

      • Retain The Latest N Days Of Exploration Records: Keep all records and reports from the past n days, both successful and failed. The default is 15 days, and you can set any integer between 1 and 90 days.

    2. Click Confirm to complete the basic configuration.

    Compute source

    Specify the scope of data tables eligible for automatic data exploration.

    Important

    Data exploration consumes compute resources from the project where the data table resides. Configure this setting based on your actual business needs.

    1. Click the Edit button at the bottom and configure the parameters.

      Parameter

      Description

      Concurrent Rate Limiting

      Controls the number of concurrently running tasks, including both data exploration and metric analysis tasks. The minimum is 1 and the default is 5. Enter an integer between 1 and 5.

      Advanced Parameter Configuration

      When enabled, lets you set parameters for global exploration tasks to optimize performance or accommodate specific compute engines for both exploration and metric analysis tasks.

      • Click the Reference Example box to view and copy the example statement.

      • Click Typical Scenario Description to view common exploration task errors and their solutions through parameter configuration. For more information, see Typical scenario description.

      Exploration Timeout

      Sets the maximum duration for exploration tasks to prevent prolonged resource consumption. Tasks that exceed the specified time are marked as failed. Valid values: 1 to 24 hours, with precision up to one decimal place.

      Physical Table Range

      Specifies the range of physical tables and views for automatic exploration by project.

      • All Projects: Includes all physical tables and views under every project, both existing and newly created, for automatic exploration.

      • All Production Projects (basic And Prod): Includes all physical tables and views under production projects, both existing and newly created, for automatic exploration.

      • Specified Projects: Allows you to select specific projects for automatic exploration. Multiple selections are supported.

      Logical Table Range

      Specifies the range of logical tables and views for automatic exploration by data section.

      • All Sections: Includes all logical tables and views under every section, both existing and newly created, for automatic exploration.

      • All Production Sections (basic And Prod): Includes all logical tables and views under production sections, both existing and newly created, for automatic exploration.

      • Specified Sections: Allows you to select specific sections for automatic exploration. Multiple selections are supported.

    2. Click Confirm to complete the compute source table data exploration configuration.

      Note

      If the scope of supported tables for automatic exploration changes, the automatic exploration switch will be turned off for tables that are no longer supported. Ongoing exploration tasks will not be affected.

    Data source

    This page lists data source types that have been collected in metadata and support data exploration and metric analysis. You can configure the scope of data source tables eligible for automatic data exploration.

    1. You can view the name, type, maximum number of concurrent tasks, data exploration status, exploration timeout, and last modification time of a data source.

    2. You can search by data source name or filter by data source type.

    3. To configure data exploration for a target data source, click the Edit icon in the Operation column. In the Control Settings dialog box, configure the parameters.

      Parameter

      Description

      Concurrency settings

      Concurrent Rate Limiting

      Controls the number of concurrently running data source table exploration tasks. The minimum is 1 and the default is 5. Enter an integer between 1 and 5.

      Advanced Parameter Configuration

      When enabled, lets you set parameters for global exploration tasks to optimize performance or accommodate specific compute engines for both data source table exploration and metric analysis tasks.

      • Click Reference Example in the parameter configuration box to view and copy reference statements.

      • Click Typical Scenario Description to view common exploration task errors and their solutions through parameter configuration. For more information, see Typical scenario description.

      Data exploration

      Data Profile

      Disabled by default. When enabled, supported data source tables can be explored.

      Exploration Timeout: Available when data exploration is enabled. Sets the maximum duration for exploration tasks to prevent prolonged resource consumption. Tasks that exceed the specified time are marked as failed. Valid values: 1 to 24 hours, with precision up to one decimal place.

    4. Click Confirm to complete the data source table data exploration configuration.

What to do next

After you complete the data exploration configuration, you can configure automatic exploration for data tables. For more information, see Create a data exploration task.