R&D platform settings - Dataphin - Alibaba Cloud Documentation Center

The Development Platform helps you manage concurrency for locks and ANALYZE commands during development. This topic describes how to configure edit locks, object submission, query acceleration, and storage volume update settings.

Limits

You can use storage volume update settings when your compute engine is E-MapReduce 3.x, E-MapReduce 5.x, CDH 5.x, CDH 6.x, FusionInsight 8.x, Cloudera Data Platform 7.x, AsiaInfo DP 5.3, ArgoDB, TDH 6.x, StarRocks, SelectDB, or Doris.
SelectDB and Doris compute engines do not support table management settings or the default compute engine for standard modeling.

Permissions

Only custom user roles with the Manage Development Platform Settings permission, super administrators, or system administrators can configure the Development Platform.

Access the Development Platform

On the Dataphin homepage, in the top menu bar, choose Management Hub > System Settings.
In the left navigation pane, choose Platform Settings > Development Platform.

Edit Locks

In the Edit Locks section, click the Edit icon. Turn on the exclusive edit lock switch and configure the lock.

Parameter	Description
Exclusive Edit Lock	When disabled, users can overwrite each other's lock status. When enabled, after a user locks an object, other users cannot edit it until the lock is manually released or expires.
Lock Duration	If a user performs no edit actions during the lock duration, the exclusive lock expires and becomes available to other users. The default value is 30 minutes. The minimum value is 5 minutes. The maximum value is 120 minutes.
Auto-release When Closing Objects	Automatically release the lock when you close the object editing tab.
Auto-release On Successful Submission	Automatically release the lock on successful submission. The lock remains if submission fails.

Click OK to complete the edit lock configuration.
To restore the system's initial configuration, click Restore Defaults.

Storage Volume Update Settings

Hive does not automatically update storage volume information for tables written directly to HDFS by integration or real-time development tasks. This includes table-level and partition-level storage volumes. As a result, storage volume information does not appear in the Asset Catalog. Dataphin provides automatic ANALYZE command execution after table updates to retrieve the latest storage volume information. You can configure this in Management Hub > System Settings > Development Platform Settings.

In the Storage Volume Update Settings section, click the Edit icon. Turn on automatic storage volume updates and configure the concurrent connection count.
- Automatic Storage Volume Updates: Disabled by default. When enabled, Dataphin automatically runs the ANALYZE command on Hive target tables after tasks succeed. If you have many integration or real-time development tasks and your Hive Server has strong performance, increase the concurrent connection count to reduce overall ANALYZE command runtime. This ensures that updated storage volume information appears in the Asset Catalog the next day. Note: Too many concurrent connections consume excessive compute resources and may affect other tasks. Configure this value based on your business scenario.
- Maximum Connections: Set the maximum number of concurrent connections used to run ANALYZE commands. The default value is 5. Valid values are integers from 1 to 200.
  Important
  If automatic storage updates are enabled and an Analyze command runs for more than 24 hours, the system automatically terminates running or pending commands to conserve compute resources.
Click OK to complete the storage volume update configuration.
Note
- When you enable automatic storage volume updates, the configured concurrent connection count takes effect immediately. Too many concurrent connections consume excessive compute resources and may affect other tasks. Configure this value based on your business scenario.
- When you disable automatic storage volume updates, running or pending ANALYZE commands continue unaffected. Target tables of future successful integration or real-time development tasks will not update their storage volume automatically. To update manually, run the ANALYZE command in Hive.

Node Task Settings

In the Node Task Settings section, click the Edit icon. Configure the default scheduled time for new tasks and object submission rules.

New

Parameter		Description
Default Priority		Default priority for new integration tasks, compute tasks, and logical table tasks. Choose Lowest, Low, or Medium. Medium is selected by default.
Default Scheduled Time	Random Within Interval	The default time interval is 00:00–03:00. The default random interval is 5 minutes. The end time must be later than the start time. Valid times range from 00:00 to 23:59 in hh:mm format. The random interval must be an integer from 1 to 30.
	Fixed Time	The default fixed time is 00:00. Valid times range from 00:00 to 23:59 in hh:mm format.
Default Python Version		Default Python version for new Python compute tasks, new Python offline compute templates, and installing third-party Python packages. Choose Python 2.7, Python 3.7, or Python 3.11. Python 3.7 is selected by default.

Note

The default scheduled time is set to Random Within Interval. You can change it to Fixed Time as needed.
When you create offline tasks (integration tasks, compute tasks, or logical table tasks), the scheduled time is automatically set based on this configuration.
- If Random Within Interval is selected, the system picks a time at random within the configured interval.
- If Fixed Time is selected, the system uses the configured time.

Run
Hide LogView URL When SQL Contains Account Password Global Variables: Account password global variables in SQL appear in plaintext in MaxCompute LogView. This poses a security risk. This setting is disabled by default.
When enabled, if MAXCOMPUTE_SQL or logical table tasks contain account password global variables, the LogView URL is hidden in development environment run logs, preview logs, and production environment O&M logs. The LogView URL is replaced with Current SQL uses account password global variable {dp_glb_xxx}. LogView URL is hidden. The logview url is invisible because of current SQL is using global variable “{dp_glb_xxx}”, which is of type account and password..
Note
This setting applies only when the compute engine is MaxCompute.

Submit

Parameter	Description
Auto-parse Dependencies for Offline Development Objects on Submission	When enabled, dependency parsing triggers automatically each time you submit an offline development object (such as an SQL compute task or logical table task). This updates the upstream dependency list and prevents missing dependencies.
Field Type Validation for Logical Table Submission	When enabled, the system validates whether the return type of field computation logic matches the field type. If they do not match, submission is blocked to prevent implicit type conversion and data errors.

Unpublish and Delete
Allow Deleting Published Objects in Development Environment: When enabled, you can delete objects published to production (such as compute tasks, integration tasks, logical tables, atomic metrics, business filters, and derived metrics) directly in the development environment.
Important
Deleted objects cannot be recovered. In the development environment, if you delete a development object without publishing the corresponding deletion task to production, the production object cannot be modified because its development counterpart no longer exists.

Default Dependency Period and Default Dependency Policy

Configure the Default Dependency Period and Default Dependency Policy.

Default Dependency Period: Choose Current Cycle (Today), Previous Cycle (Yesterday), Last 24 Hours, or Previous N Cycles. For Previous N Cycles, N defaults to 2 and cannot be empty.
Default Dependency Policy: Choose First Instance, Nearest Instance, All Instances, or Last Instance.

The initial default dependency period and policy are shown in the following table.

Current Node Scheduling Cycle	Upstream Node Scheduling Cycle	Upstream Node Self-dependency	Default Dependency Period	Default Dependency Policy
Daily/Weekly/Monthly	Day	Yes/No	Current Cycle (Today)	Last Instance
Daily/Weekly/Monthly	Hour/Minute	No	Current Cycle (Today)	All Instances
Daily/Weekly/Monthly	Hours or Minutes	Yes	Current Cycle (Today)	Last Instance
Month, Week, Day, Hour, or Minute	Monthly/Weekly	Yes	Current Cycle (Today)	Last Instance
Monthly/Weekly/Daily/Hourly/Minutely	Monthly/Weekly	No	Current Cycle (Today)	Last Instance
Hour/Minute	Day	Yes/No	Current Cycle (Today)	Last Instance
Hour/Minute	Hours/Minutes	Yes/No	Current Cycle (Today)	Last Instance

Tag Values
Manage tag values for task tags. Click Add Tag to add a new tag. You can add up to 50 tags. Tag names support any characters and must be no longer than 64 characters.
Click the Delete icon to remove an existing tag. After deletion, tasks with that tag no longer display it. Re-add the same tag to restore it.

After completing the configuration, click OK.
To restore the system's initial configuration, click Restore Defaults.

Table Management Settings

Note

StarRocks, GaussDB (DWS), Doris, and SelectDB compute engines do not support table management settings.

In the Table Management Settings section, click the Edit icon. Configure Auto-generate Pending Deletion Items After Dropping Tables Using SQL and Auto-generate Pending Deletion Items When Deleting Tables in Table Management.
- Auto-generate Pending Deletion Items After Dropping Tables Using SQL: Enabled by default. When enabled, the system auto-generates a pending deletion item after you run a DROP TABLE statement in ad hoc queries or SQL compute tasks in the development environment. When disabled, running DROP TABLE table_name does not generate a pending deletion item.
- Auto-generate Pending Deletion Items When Deleting Tables in Table Management: Enabled by default. When enabled, the system generates a pending deletion item when you delete a table in Table Management. When disabled, no pending deletion item is generated.

Configure Default Storage Format and Default External Table Storage Format. Different compute engines support different storage formats. See the table below.

Note

When the compute engine is AnalyticDB for PostgreSQL, you cannot configure Default Storage Format.
Only when the compute engine is MaxCompute can you configure Default External Table Storage Format.
In the table below, - means not supported.

Engine Default (Can Be Overridden in CREATE TABLE)

hudi

delta (Delta Lake)

paimon

iceberg

kudu

parquet

avro

rcfile

orc

textfile

sequencefile

binaryfile

csv

text

json

MaxCompute

Supported

Lindorm (Compute Engine)

Supported

Databricks

Supported

Amazon EMR

Supported

Transwarp TDH 6.x
Transwarp TDH 9.3.x

Supported

CDH 5.x
CDH 6.x
E-MapReduce 3.x
E-MapReduce 5.x
Cloudera Data Platform 7.x
Huawei FusionInsight 8.x
AsiaInfo DP 5.3

Supported

Configure the default lifecycle. This sets the default lifecycle for physical and logical tables when the compute engine is MaxCompute. By default, this is empty (no lifecycle is set). Enter an integer between 1 and 36500. You can also quickly select 7, 14, 30, or 360 days.
Note
Default lifecycle configuration is supported only when the compute engine is MaxCompute.
After completing the configuration, click OK.
To restore the system's initial configuration, click Restore Defaults.

Default Compute Engine for Standard Modeling

Tenants with a Hadoop compute engine can configure the default compute engine for standard modeling. Options include Hive, Impala, and Spark. These compute engines have the following limits:

Important

If the project's compute source does not have the required task enabled, the system automatically switches to the Hive compute engine. For more information, see Create a Hadoop Compute Source.

Hive: Cannot read source tables stored in Kudu format.
Impala: Can read source tables stored in Kudu format but does not support storing logical tables in Kudu format. Do not use Impala unless you have Kudu-format source tables.
Note
Impala is not supported when the compute engine is Amazon EMR.
Spark: Cannot read source tables stored in Kudu format.

Query Acceleration

Enable or disable MCQA query acceleration. When enabled, all MaxCompute SQL queries in ad hoc queries and all SQL unit queries in the Analytics Platform use MCQA query acceleration. When disabled, the current tenant cannot use MCQA query acceleration.

Important

Query acceleration supports only the MaxCompute compute engine.