All Products
Search
Document Center

DataWorks:GetFile

Last Updated:Oct 11, 2023

Queries the information about a file.

Debugging

OpenAPI Explorer automatically calculates the signature value. For your convenience, we recommend that you call this operation in OpenAPI Explorer. OpenAPI Explorer dynamically generates the sample code of the operation for different SDKs.

Request parameters

Parameter

Type

Required

Example

Description

Action

String

Yes

GetFile

The operation that you want to perform.

ProjectId

Long

No

10000

The DataWorks workspace ID. You can log on to the DataWorks console and go to the Workspace page to obtain the workspace ID.

You must configure either this parameter or the ProjectIdentifier parameter to determine the DataWorks workspace to which the operation is applied.

ProjectIdentifier

String

No

dw_project

The name of the DataWorks workspace. You can log on to the DataWorks console and go to the Workspace page to obtain the workspace name.

You must configure either this parameter or the ProjectId parameter to determine the DataWorks workspace to which the operation is applied.

FileId

Long

No

100000001

The file ID. You can call the ListFiles operation to obtain the file ID.

NodeId

Long

No

200000001

The ID of the node that is scheduled. You can call the ListFiles operation to obtain the node ID.

Response parameters

Parameter

Type

Example

Description

HttpStatusCode

Integer

200

The HTTP status code.

ErrorMessage

String

The connection does not exist.

The error message.

RequestId

String

0000-ABCD-EFG****

The request ID.

ErrorCode

String

Invalid.Tenant.ConnectionNotExists

The error code.

Success

Boolean

true

Indicates whether the request was successful. Valid values:

  • true

  • false

Data

Object

The details of the file.

File

Object

The basic information about the file.

CommitStatus

Integer

0

Indicates whether the latest code in the file is committed. Valid values: 0 and 1. The value 0 indicates that the latest code in the file is not committed. The value 1 indicates that the latest code in the file is committed.

AutoParsing

Boolean

true

Indicates whether the automatic parsing feature is enabled for the file. Valid values:

  • true

  • false

This parameter corresponds to the Analyze Code parameter that is displayed after Same Cycle is selected in the Dependencies section of the Properties tab in the DataWorks console.

Owner

String

7775674356****

The ID of the Alibaba Cloud account used by the file owner.

CreateTime

Long

1593879116000

The time when the file was created. This value is a UNIX timestamp representing the number of milliseconds that have elapsed since January 1, 1970, 00:00:00 UTC.

FileType

Integer

10

The type of the code for the file. Valid values: 6 (Shell), 10 (ODPS SQL), 11 (ODPS MR), 23 (Data Integration), 24 (ODPS Script), 99 (zero load), 221 (PyODPS 2), 225 (ODPS Spark), 227 (EMR Hive), 228 (EMR Spark), 229 (EMR Spark SQL), 230 (EMR MR), 239 (OSS object inspection), 257 (EMR Shell), 258 (EMR Spark Shell), 259 (EMR Presto), 260 (EMR Impala), 900 (real-time synchronization), 1089 (cross-tenant collaboration), 1091 (Hologres development), 1093 (Hologres SQL), 1100 (assignment), and 1221 (PyODPS 3).

CurrentVersion

Integer

3

The latest version number of the file.

BizId

Long

1000001

The ID of the workflow to which the file belongs. This parameter is deprecated and replaced by the BusinessId parameter.

LastEditUser

String

62465892****

The ID of the Alibaba Cloud account used to last modify the file.

FileName

String

ods_user_info_d

The name of the file.

ConnectionName

String

odps_first

The ID of the compute engine instance that is used to run the node that corresponds to the file.

UseType

String

NORMAL

The module to which the file belongs. Valid values:

  • NORMAL: The file is used for DataStudio.

  • MANUAL: The file is used for a manually triggered node.

  • MANUAL_BIZ: The file is used for a manually triggered workflow.

  • SKIP: The file is used for a dry-run DataStudio node.

  • ADHOCQUERY: The file is used for an ad hoc query.

  • COMPONENT: The file is used for a snippet.

FileFolderId

String

2735c2****

The ID of the folder to which the file belongs.

ParentId

Long

-1

The ID of the node group file to which the current file belongs. This parameter is returned only if the current file is an inner file of the node group file.

CreateUser

String

424732****

The ID of the Alibaba Cloud account used to create the file.

IsMaxCompute

Boolean

true

Indicates whether the file needs to be uploaded to MaxCompute.

This parameter is returned only if the file is a MaxCompute resource file.

BusinessId

Long

1000001

The ID of the workflow to which the file belongs.

FileDescription

String

My first DataWorks file

The description of the file.

DeletedStatus

String

RECYCLE

The status of the file. Valid values:

  • NORMAL: The file is not deleted.

  • RECYCLE_BIN: The file is stored in the recycle bin.

  • DELETED: The file is deleted.

LastEditTime

Long

1593879116000

The time when the file was last modified. This value is a UNIX timestamp representing the number of milliseconds that have elapsed since January 1, 1970, 00:00:00 UTC.

Content

String

SHOW TABLES;

The code in the file.

NodeId

Long

300001

The ID of the auto triggered node that is generated in the scheduling system after the file is committed.

AdvancedSettings

String

{"queue":"default","SPARK_CONF":"--conf spark.driver.memory=2g"}

The advanced configurations of the node.

This parameter is valid only for an EMR Spark Streaming node or an EMR Streaming SQL node. This parameter corresponds to the Advanced Settings tab of the node in the DataWorks console.

The value of this parameter must be in the JSON format.

FileId

Long

100000001

The file ID.

NodeConfiguration

Object

The scheduling configurations of the file.

RerunMode

String

ALL_ALLOWED

Indicates whether the node that corresponds to the file can be rerun. Valid values:

  • ALL_ALLOWED: The node can be rerun regardless of whether it is successfully run or fails to run.

  • FAILURE_ALLOWED: The node can be rerun only after it fails to run.

  • ALL_DENIED: The node cannot be rerun regardless of whether it is successfully run or fails to run.

This parameter corresponds to the Rerun parameter in the Schedule section of the Properties tab in the DataWorks console.

SchedulerType

String

NORMAL

The scheduling type of the node. Valid values:

  • NORMAL: The node is an auto triggered node.

  • MANUAL: The node is a manually triggered node. Manually triggered nodes cannot be automatically triggered. They correspond to the nodes in the Manually Triggered Workflows pane.

  • PAUSE: The node is a paused node.

  • SKIP: The node is a dry-run node. Dry-run nodes are started as scheduled, but the system sets the status of the nodes to successful when it starts to run them.

Stop

Boolean

false

Indicates whether the scheduling for the node is suspended Valid values:

  • true

  • false

This parameter corresponds to the Recurrence parameter in the Schedule section of the Properties tab in the DataWorks console.

ParaValue

String

a=x b=y

The scheduling parameters of the node.

This parameter corresponds to the Scheduling Parameter section of the Properties tab in the DataWorks console. For more information about the configurations of scheduling parameters, see Configure scheduling parameters.

StartEffectDate

Long

936923400000

The beginning of the time range for automatic scheduling. This value is a UNIX timestamp representing the number of milliseconds that have elapsed since January 1, 1970, 00:00:00 UTC.

Configuring this parameter is equivalent to specifying a start time for the Validity Period parameter in the Schedule section of the Properties tab in the DataWorks console.

EndEffectDate

Long

4155787800000

The end of the time range for automatic scheduling. This value is a UNIX timestamp representing the number of milliseconds that have elapsed since January 1, 1970, 00:00:00 UTC.

Configuring this parameter is equivalent to specifying an end time for the Validity Period parameter in the Schedule section of the Properties tab in the DataWorks console.

CycleType

String

DAY

The type of the scheduling cycle. Valid values: NOT_DAY and DAY. The value NOT_DAY indicates that the node is scheduled to run by minute or hour. The value DAY indicates that the node is scheduled to run by day, week, or month.

This parameter corresponds to the Scheduling Cycle parameter in the Schedule section of the Properties tab in the DataWorks console.

DependentNodeIdList

String

5,10,15,20

The ID of the node on which the node that corresponds to the file depends when the DependentType parameter is set to USER_DEFINE. Multiple IDs are separated by commas (,).

The value of this parameter is equivalent to the ID of the node that you specified after you select Previous Cycle and set Depend On to Other Nodes in the Dependencies section of the Properties tab in the DataWorks console.

ResourceGroupId

Long

375827434852437

The ID of the resource group that is used to run the node that corresponds to the file. You can call the ListResourceGroups operation to query the available resource groups in the workspace.

DependentType

String

USER_DEFINE

The type of the cross-cycle scheduling dependency of the node. Valid values:

  • SELF: The instance generated for the node in the current cycle depends on the instance generated for the node in the previous cycle.

  • CHILD: The instance generated for the node in the current cycle depends on the instances generated for the descendant nodes at the nearest level of the node in the previous cycle.

  • USER_DEFINE: The instance generated for the node in the current cycle depends on the instances generated for one or more specified nodes in the previous cycle.

  • NONE: No cross-cycle scheduling dependency type is selected for the node.

AutoRerunTimes

Integer

3

The number of automatic reruns that are allowed after an error occurs.

AutoRerunIntervalMillis

Integer

120000

The interval between automatic reruns after an error occurs. Unit: millisecond.

This parameter corresponds to the Rerun interval parameter that is displayed after the Auto Rerun upon Error check box is selected in the Schedule section of the Properties tab in the DataWorks console.

The interval that you specify in the DataWorks console is measured in minutes. Pay attention to the conversion between the units of time when you call the operation.

CronExpress

String

00 05 00 * * ?

The CRON expression that represents the periodic scheduling policy of the node.

InputList

Array of NodeInputOutput

The output names of the parent files on which the current file depends.

Input

String

project.001_out

The output name of the parent file on which the current file depends.

This parameter corresponds to the Output Name of Ancestor Node parameter under Parent Nodes after Same Cycle is selected in the Dependencies section of the Properties tab in the DataWorks console.

ParseType

String

MANUAL

The mode of the configuration file dependency. Valid values:

  • MANUAL: Scheduling dependencies are manually configured.

  • AUTO: Scheduling dependencies are automatically parsed.

OutputList

Array of NodeInputOutput

The output names of the current file.

This parameter corresponds to the Output Name parameter under Output after Same Cycle is selected in the Dependencies section of the Properties tab in the DataWorks console.

RefTableName

String

ods_user_info_d

The output table name of the current file.

This parameter corresponds to the Output Table Name parameter under Output after Same Cycle is selected in the Dependencies section of the Properties tab in the DataWorks console.

Output

String

dw_project.002_out

The output name of the current file.

This parameter corresponds to the Output Name parameter under Output after Same Cycle is selected in the Dependencies section of the Properties tab in the DataWorks console.

StartImmediately

Boolean

true

Indicates whether a node is immediately run after the node is deployed to the production environment.

This parameter is valid only for an EMR Spark Streaming node or an EMR Streaming SQL node. This parameter corresponds to the Start Method parameter in the Schedule section of the Configure tab in the DataWorks console.

InputParameters

Array of InputContextParameter

Input parameters of the node.

This parameter corresponds to the Input Parameters table in the Input and Output Parameters section of the Properties tab in the DataWorks console.

ParameterName

String

input

The name of the input parameter of the node. In the code, you can use the ${...} method to reference the input parameter of the node.

This parameter corresponds to the Parameter Name parameter in the Input Parameters table in the Input and Output Parameters section of the Properties tab in the DataWorks console.

ValueSource

String

project_001.parent_node:outputs

The value source of the input parameter of the node.

This parameter corresponds to the Value Source parameter in the Input Parameters table in the Input and Output Parameters section of the Properties tab in the DataWorks console.

OutputParameters

Array of OutputContextParameter

Output parameters of the node.

This parameter corresponds to the Output Parameters table in the Input and Output Parameters section of the Properties tab in the DataWorks console.

ParameterName

String

output

The name of the output parameter of the node.

This parameter corresponds to the Parameter Name parameter in the Output Parameters table in the Input and Output Parameters section of the Properties tab in the DataWorks console.

Value

String

${bizdate}

The value of the output parameter of the node.

This parameter corresponds to the Value parameter in the Output Parameters table in the Input and Output Parameters section of the Properties tab in the DataWorks console.

Type

String

1

The type of the output parameter of the node. Valid values:

  • 1: indicates a constant.

  • 2: indicates a variable.

  • 3: indicates a pass-through variable.

This parameter corresponds to the Type parameter in the Output Parameters table in the Input and Output Parameters section of the Properties tab in the DataWorks console.

Description

String

It's a context output parameter.

The description of the output parameter of the node.

Examples

Sample requests

http(s)://[Endpoint]/?Action=GetFile
&ProjectId=10000
&ProjectIdentifier=dw_project
&FileId=100000001
&NodeId=200000001
&Common request parameters

Sample success responses

XML format

HTTP/1.1 200 OK
Content-Type:application/xml

<GetFileResponse>
    <HttpStatusCode>200</HttpStatusCode>
    <ErrorMessage>The connection does not exist.</ErrorMessage>
    <RequestId>0000-ABCD-EFG****</RequestId>
    <ErrorCode>Invalid.Tenant.ConnectionNotExists</ErrorCode>
    <Success>true</Success>
    <Data>
        <File>
            <CommitStatus>0</CommitStatus>
            <AutoParsing>true</AutoParsing>
            <Owner>7775674356****</Owner>
            <CreateTime>1593879116000</CreateTime>
            <FileType>10</FileType>
            <CurrentVersion>3</CurrentVersion>
            <BizId>1000001</BizId>
            <LastEditUser>62465892****</LastEditUser>
            <FileName>ods_user_info_d</FileName>
            <ConnectionName>odps_first</ConnectionName>
            <UseType>NORMAL</UseType>
            <FileFolderId>2735c2****</FileFolderId>
            <ParentId>-1</ParentId>
            <CreateUser>424732****</CreateUser>
            <IsMaxCompute>true</IsMaxCompute>
            <BusinessId>1000001</BusinessId>
            <FileDescription>My first DataWorks file</FileDescription>
            <DeletedStatus>RECYCLE</DeletedStatus>
            <LastEditTime>1593879116000</LastEditTime>
            <Content>SHOW TABLES;</Content>
            <NodeId>300001</NodeId>
            <AdvancedSettings>{"queue":"default","SPARK_CONF":"--conf spark.driver.memory=2g"}</AdvancedSettings>
        </File>
        <NodeConfiguration>
            <RerunMode>ALL_ALLOWED</RerunMode>
            <SchedulerType>NORMAL</SchedulerType>
            <Stop>false</Stop>
            <ParaValue>a=x b=y</ParaValue>
            <StartEffectDate>936923400000</StartEffectDate>
            <EndEffectDate>4155787800000</EndEffectDate>
            <CycleType>DAY</CycleType>
            <DependentNodeIdList>5,10,15,20</DependentNodeIdList>
            <ResourceGroupId>375827434852437</ResourceGroupId>
            <DependentType>USER_DEFINE</DependentType>
            <AutoRerunTimes>3</AutoRerunTimes>
            <AutoRerunIntervalMillis>120000</AutoRerunIntervalMillis>
            <CronExpress>00 05 00 * * ?</CronExpress>
            <InputList>
                <Input>project.001_out</Input>
                <ParseType>MANUAL</ParseType>
            </InputList>
            <OutputList>
                <RefTableName>ods_user_info_d</RefTableName>
                <Output>dw_project.002_out</Output>
            </OutputList>
            <StartImmediately>true</StartImmediately>
            <InputParameters>
                <ParameterName>input</ParameterName>
                <ValueSource>project_001.parent_node:outputs</ValueSource>
            </InputParameters>
            <OutputParameters>
                <ParameterName>output</ParameterName>
                <Value>${bizdate}</Value>
                <Type>1</Type>
                <Description>It's a context output parameter.</Description>
            </OutputParameters>
        </NodeConfiguration>
    </Data>
</GetFileResponse>

JSON format

HTTP/1.1 200 OK
Content-Type:application/json

{
  "HttpStatusCode" : 200,
  "ErrorMessage" : "The connection does not exist.",
  "RequestId" : "0000-ABCD-EFG****",
  "ErrorCode" : "Invalid.Tenant.ConnectionNotExists",
  "Success" : true,
  "Data" : {
    "File" : {
      "CommitStatus" : 0,
      "AutoParsing" : true,
      "Owner" : "7775674356****",
      "CreateTime" : 1593879116000,
      "FileType" : 10,
      "CurrentVersion" : 3,
      "BizId" : 1000001,
      "LastEditUser" : "62465892****",
      "FileName" : "ods_user_info_d",
      "ConnectionName" : "odps_first",
      "UseType" : "NORMAL",
      "FileFolderId" : "2735c2****",
      "ParentId" : -1,
      "CreateUser" : "424732****",
      "IsMaxCompute" : true,
      "BusinessId" : 1000001,
      "FileDescription" : "My first DataWorks file",
      "DeletedStatus" : "RECYCLE",
      "LastEditTime" : 1593879116000,
      "Content" : "SHOW TABLES;",
      "NodeId" : 300001,
      "AdvancedSettings" : "{\"queue\":\"default\",\"SPARK_CONF\":\"--conf spark.driver.memory=2g\"}"
    },
    "NodeConfiguration" : {
      "RerunMode" : "ALL_ALLOWED",
      "SchedulerType" : "NORMAL",
      "Stop" : false,
      "ParaValue" : "a=x b=y",
      "StartEffectDate" : 936923400000,
      "EndEffectDate" : 4155787800000,
      "CycleType" : "DAY",
      "DependentNodeIdList" : "5,10,15,20",
      "ResourceGroupId" : 375827434852437,
      "DependentType" : "USER_DEFINE",
      "AutoRerunTimes" : 3,
      "AutoRerunIntervalMillis" : 120000,
      "CronExpress" : "00 05 00 * * ?",
      "InputList" : {
        "Input" : "project.001_out",
        "ParseType" : "MANUAL"
      },
      "OutputList" : {
        "RefTableName" : "ods_user_info_d",
        "Output" : "dw_project.002_out"
      },
      "StartImmediately" : true,
      "InputParameters" : {
        "ParameterName" : "input",
        "ValueSource" : "project_001.parent_node:outputs"
      },
      "OutputParameters" : {
        "ParameterName" : "output",
        "Value" : "${bizdate}",
        "Type" : 1,
        "Description" : "It's a context output parameter."
      }
    }
  }
}

Error codes

HTTP status code

Error code

Error message

Description

429

Throttling.Api

The request for this resource has exceeded your available limit.

The number of requests for the resource has exceeded the upper limit.

429

Throttling.System

The DataWorks system is busy. Try again later.

The DataWorks system is busy. Try again later.

429

Throttling.User

Your request is too frequent. Try again later.

Excessive requests have been submitted within a short period of time. Try again later.

500

InternalError.System

An internal system error occurred. Try again later.

An internal error occurred.

500

InternalError.UserId.Missing

An internal system error occurred. Try again later.

An internal error occurred.

For a list of error codes, see Service error codes.