All Products
Search
Document Center

DataWorks:CreateFile

Last Updated:Jan 12, 2026

Creates a file in DataStudio. You cannot call this operation to create Data Integration nodes.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Authorization information

There is currently no authorization information disclosed in the API.

Request parameters

ParameterTypeRequiredDescriptionExample
FileFolderPathstringYes

The file path.

Business_process/First_Business_Process/MaxCompute/Folder_1/Folder_2
ProjectIdlongYes

The DataWorks workspace ID. To obtain the workspace ID, log on to the DataWorks console and navigate to the workspace configuration page. You must configure either this parameter or the ProjectIdentifier parameter to determine the DataWorks workspace to which the operation is applied.

10000
FileNamestringYes

The file name.

File name
FileDescriptionstringNo

The description of the file.

test
FileTypeintegerYes

The code type of the file. Different file types have different code. For more information, see DataWorks node types. You can call the ListFileType operation to query the code types of files.

10
OwnerstringNo

The Alibaba Cloud account ID of the file owner. If this parameter is not specified, the Alibaba Cloud account ID of the caller is used by default.

1000000000001
ContentstringNo

The file code content. Different code types (fileType) have different code formats. In Operation Center, you can find a task of the corresponding type, right-click it, and select View Code to view the specific code format.

SHOW TABLES;
AutoRerunTimesintegerNo

The number of automatic reruns after an error occurs. Maximum value: 10.

3
AutoRerunIntervalMillisintegerNo

The interval at which the node is automatically rerun after a failure. Unit: milliseconds. Maximum value: 1800000 milliseconds (30 minutes).

This parameter corresponds to the Rerun interval parameter in Properties > Schedule > Auto Rerun upon Failure for data development nodes in the DataWorks console. In the console, the unit of the rerun interval is minutes. Convert the time unit when you call this operation.

120000
RerunModestringNo

The rerun policy. Valid values:

  • ALL_ALLOWED: Reruns are allowed regardless of whether the task succeeds or fails.
  • FAILURE_ALLOWED: Reruns are allowed only when the task fails.
  • ALL_DENIED: Reruns are not allowed regardless of whether the task succeeds or fails.

This parameter corresponds to the Support for Rerun setting in Scheduling > Scheduling Policies for Data Studio tasks in the DataWorks console.

ALL_ALLOWED
StopbooleanNo

Specifies whether to skip execution. Valid values:

  • true
  • false

This parameter corresponds to the Skip Execution option in Properties > Schedule > Recurrence for data development nodes in the DataWorks console.

false
ParaValuestringNo

The scheduling parameters of the node. Separate multiple parameters with spaces.

This parameter corresponds to the Scheduling Parameter setting in Properties for data development nodes in the DataWorks console. For more information, see Scheduling parameters.

a=x b=y
StartEffectDatelongNo

The timestamp (in milliseconds) when automatic scheduling starts.

This parameter corresponds to the start time of Effective Period in Scheduling > Scheduling Time for Data Studio tasks in the DataWorks console.

1671608450000
EndEffectDatelongNo

The timestamp (in milliseconds) when automatic scheduling stops.

This parameter corresponds to the end time of Effective Period in Scheduling > Scheduling Time for Data Studio tasks in the DataWorks console.

1671694850000
CronExpressstringNo

The cron expression for scheduled execution. This parameter corresponds to the Cron Expression setting in Scheduling > Scheduling Time for Data Studio tasks in the DataWorks console. After you configure Scheduling Cycle and Scheduled Time, DataWorks automatically generates a cron expression.

Examples:

  • Scheduled at 05:30 every day: 00 30 05 * * ?
  • Scheduled at the 15th minute of every hour: 00 15 00-23/1 * * ?
  • Scheduled every 10 minutes: 00 00/10 * * * ?
  • Scheduled every 10 minutes between 08:00 and 17:00 every day: 00 00-59/10 8-17 * * * ?
  • Scheduled at 00:20 on the 1st day of every month: 00 20 00 1 * ?
  • Scheduled every 3 months starting from 00:10 on January 1: 00 10 00 1 1-12/3 ?
  • Scheduled at 00:05 on every Tuesday and Friday: 00 05 00 * * 2,5

Due to the rules of the DataWorks scheduling system, cron expressions have the following restrictions:

  • The minimum scheduling interval is 5 minutes.
  • The earliest scheduling time each day is 00:05.
00 05 00 * * ?
CycleTypestringNo

The type of scheduling cycle. Valid values: NOT_DAY (minute, hour) and DAY (day, week, month).

This parameter corresponds to the Scheduling Cycle setting in Scheduling > Scheduling Time for Data Studio tasks in the DataWorks console.

DAY
DependentTypestringNo

The dependency mode on the previous cycle. Valid values:

  • SELF: Depends on the current node.
  • CHILD: Depends on the child nodes.
  • USER_DEFINE: Depends on other nodes.
  • NONE: No dependencies. Does not depend on the previous cycle.
  • USER_DEFINE_AND_SELF: Depends on both the current node and other nodes in the previous cycle.
  • CHILD_AND_SELF: Depends on both the current node and its child nodes in the previous cycle.
NONE
DependentNodeIdListstringNo

The IDs of the nodes on which the current node depends. This parameter takes effect only when the DependentType parameter is set to USER_DEFINE. Separate multiple node IDs with commas (,).

This parameter corresponds to the Other Nodes option in Properties > Dependencies > Cross-cycle Dependency (Original Previous-cycle Dependency) for data development nodes in the DataWorks console.

abc
InputListstringYes

The output names of the ancestor nodes on which the current node depends. Separate multiple output names with commas (,).

This parameter corresponds to the Output Name of Ancestor Node setting in Properties > Dependencies for data development nodes in the DataWorks console.

project_root,project.file1,project.001_out
ProjectIdentifierstringNo

The DataWorks workspace name. To obtain the workspace name, log on to the DataWorks console and navigate to the workspace configuration page.

You must specify either this parameter or ProjectId to identify the target DataWorks workspace for this API call.

dw_project
ResourceGroupIdentifierstringNo

The resource group for the task published from the file. To obtain the ID, log on to the DataWorks console, navigate to the workspace configuration page, and click Resource Groups in the left-side navigation pane to view the IDs of resource groups bound to the current workspace.

S_res_group_559_1613715566828
ResourceGroupIdlongNo

This parameter is deprecated.

375827434852437
ConnectionNamestringNo

The data source used when the task published from the file is run.

You can call the UpdateDataSource operation to query the available data sources in the workspace.

odps_source
AutoParsingbooleanNo

Specifies whether to enable automatic parsing for the file. Valid values:

  • true
  • false

This parameter corresponds to the Analyze Code setting in Properties > Dependencies for data development nodes in the DataWorks console.

true
SchedulerTypestringNo

The scheduling type. Valid values:

  • NORMAL: Normal scheduled task.
  • MANUAL: Manually triggered node. Not scheduled for daily execution. Corresponds to nodes in manually triggered workflows.
  • PAUSE: Paused task.
  • SKIP: Dry-run task. Scheduled for daily execution but is directly marked as successful when scheduling starts.
NORMAL
AdvancedSettingsstringNo

The advanced settings of the node.

This parameter corresponds to the Advanced Settings section in the right-side navigation pane on the configuration tab of EMR Spark Streaming and EMR Streaming SQL nodes in the DataWorks console.

Only EMR Spark Streaming and EMR Streaming SQL nodes support this parameter. The value must be in the JSON format.

{"queue":"default","SPARK_CONF":"--conf spark.driver.memory=2g"}
StartImmediatelybooleanNo

Specifies whether to immediately run the node after the node is deployed.

This parameter corresponds to the Start Method setting in Settings > Schedule in the right-side navigation pane on the configuration tab of EMR Spark Streaming and EMR Streaming SQL nodes in the DataWorks console.

true
InputParametersstringNo

The input context parameters of the node. The value must be in the JSON format. For more information about the parameter structure, see the InputContextParameterList parameter in the response parameters of the GetFile operation.

This parameter corresponds to the Input Parameters setting in Properties > Input and Output Parameters for data development nodes in the DataWorks console.

[{"ValueSource": "project_001.first_node:bizdate_param","ParameterName": "bizdate_input"}]
OutputParametersstringNo

The output context parameters of the node. The value must be in the JSON format. For more information about the parameter structure, see the OutputContextParameterList parameter in the response parameters of the GetFile operation.

This parameter corresponds to the Output Parameters setting in Properties > Input and Output Parameters for data development nodes in the DataWorks console.

[{"Type": 1,"Value": "${bizdate}","ParameterName": "bizdate_param"}]
IgnoreParentSkipRunningPropertybooleanNo

Specifies whether to inherit the dry-run status from the previous cycle. Valid values:

  • true: Inherit the dry-run status from the previous cycle.
  • false: Do not inherit the dry-run status from the previous cycle.
false
CreateFolderIfNotExistsbooleanNo

Specifies whether to automatically create the directory specified by FileFolderPath if the directory does not exist. Valid values:

  • true: If the directory does not exist, automatically create it.
  • false: If the directory does not exist, the call fails.
false
ApplyScheduleImmediatelybooleanNo

Specifies whether to apply the scheduling configuration immediately after the file is published.

true
TimeoutintegerNo

The timeout settings for scheduling configuration.

1
ImageIdstringNo

The custom image ID.

m-bp1h4b5a8ogkbll2f3tr

Response parameters

ParameterTypeDescriptionExample
object

The response.

HttpStatusCodeinteger

The HTTP status code.

200
Datalong

The file ID.

1000001
RequestIdstring

The request ID. Use this ID to troubleshoot issues.

0000-ABCD-EFG
ErrorMessagestring

The error message.

The connection does not exist.
Successboolean

Indicates whether the call succeeded. Valid values:

  • true: The call succeeded.
  • false: The call failed.
true
ErrorCodestring

The error code.

Invalid.Tenant.ConnectionNotExists

Examples

Sample success responses

JSONformat

{
  "HttpStatusCode": 200,
  "Data": 1000001,
  "RequestId": "0000-ABCD-EFG",
  "ErrorMessage": "The connection does not exist.",
  "Success": true,
  "ErrorCode": "Invalid.Tenant.ConnectionNotExists"
}

Error codes

HTTP status codeError codeError messageDescription
403Forbidden.AccessAccess is forbidden. Please first activate DataWorks Enterprise Edition or Flagship Edition.No permission, please authorize
429Throttling.ApiThe request for this resource has exceeded your available limit.-
429Throttling.SystemThe DataWorks system is busy. Try again later.-
429Throttling.UserYour request is too frequent. Try again later.-
500InternalError.SystemAn internal system error occurred. Try again later.-
500InternalError.UserId.MissingAn internal system error occurred. Try again later.-

For a list of error codes, visit the Service error codes.