This topic provides answers to some frequently asked questions about node development in DataStudio.
FAQ about node development
FAQ about operations based on compute engine instances in DataWorks
- Table-related operations
- How do I create a table in a visualized manner?
- When I create a table in a workspace with which an E-MapReduce (EMR) compute engine instance is associated, the following error message appears: call emr exception. What do I do?
- How do I add fields to a table that is in the production environment?
- How do I upload data from my on-premises machine to a MaxCompute table?
- How do I query data that is in the production environment from the development environment on the DataStudio page?
- How do I control whether the queried table data can be downloaded?
- How do I delete a table?
- Resource-related operations
- How do I upload a resource whose size is greater than 30 MB?
- How do I use a resource that is uploaded to DataWorks by using odpscmd?
- How do I reference a resource in a node?
- How do I use a MaxCompute table in DataWorks?
- How do I download a resource that is uploaded to DataWorks?
- How do I delete a MaxCompute resource?
- How do I upload a JAR package on my on-premises machine to DataWorks as a JAR resource and reference the uploaded resource in a node?
- Which kind of resource group can I use when I reference a third-party package in a PyODPS node?
- Can a Python resource call another Python resource?
- Node-related operations
- After a node is modified and committed and deployed to the production environment, is the existing faulty node in the production environment overwritten?
- How do I check whether a node is committed?
- How do I view the versions of a node?
- What is the impact on the instances of a node after the node is deleted?
- How do I recover a node that is deleted?
- How do I export the code of a node?
- Can I configure properties for all nodes in a workflow at a time?
- How do I clone a workflow?
- How do I perform operations on multiple nodes, resources, or functions at a time?
- How do I change resource groups for scheduling for multiple nodes in a workflow at a time on the DataStudio page?
- API calls
- Operations related to node running and run logs
- What are the differences in the value assignment logic of scheduling parameters among the Run, Run with Parameters, and Perform Smoke Testing in Development Environment modes?
- How do I query historical operational logs on the DataStudio page?
- How long are operational logs on the DataStudio page retained?
FAQ about operations on compute engines
- EMR compute engine
- Hive: FAQ about Hive
- Spark: FAQ about Spark
- MaxCompute compute engine
- Development:
- SQL: FAQ about SQL
- Built-in functions: FAQ about built-in functions
- User-defined functions (UDFs): FAQ about MaxCompute Java UDFs and FAQ about MaxCompute Python UDFs
- MapReduce: FAQ about MaxCompute MapReduce
- PyODPS: Can PyODPS call custom functions to use third-party packages? , When I call a pickle file in a PyODPS 3 node, the following error message appears:
_pickle.UnpicklingError: invalid load key, '\xef.
. What do I do?, and FAQ about PyODPS
- Security management:
- O&M management:
- Data download: How do I download more than 10,000 data records? and How do I disable the MaxCompute Query Acceleration (MCQA) feature if I want to obtain the instance ID that is used to download more than 10,000 data records?
- BI connections: What do I do if an error is reported when I connect Power BI to MaxCompute?
- Development:
FAQ about scheduling settings
FAQ about scheduling parameters
- Typical use scenarios of scheduling parameters
- How do I specify a table partition in a format that contains a space, such as pt=yyyy-mm-dd hh24:mi:ss?
- How do I configure the time properties of an ODPS Spark node?
- How do I reprocess the return values of the scheduling parameters for a node if the node cannot process the return values?
- How are instances generated on the day when daylight saving time begins and ends?
- I run an instance of a node at 00:00 on the current day to analyze the data in the partition that corresponds to 23:00 on the previous day. However, the data in the partition that corresponds to 23:00 on the current day is analyzed. What do I do?
- A node is scheduled to run at the time specified by the $cyctime or $[yyyymmddhh24miss] variable. The node is scheduled to run at 20:00 every day, but the ancestor node of the node fails to run as scheduled. As a result, the node is delayed and runs at 00:00 on the next day. In this case, is the value of the $cyctime or $[yyyymmddhh24miss] variable 20:00 or 00:00?
- What are the differences between the return values of a MaxCompute date function and a scheduling parameter?
- Testing of scheduling parameters
- O&M of scheduling parameters and check of the configurations of the scheduling parameters
- How do I check the validity of the values of scheduling parameters in the production environment?
- How do I check whether the values of the scheduling parameters of an instance are valid by viewing logs?
- I configure a scheduling parameter for a node and commit and deploy the node, but the return value of the scheduling parameter remains unchanged. What do I do?
FAQ about scheduling dependencies
- Information that you must understand before you configure scheduling dependencies
- What are scheduling dependencies?
- Why are scheduling dependencies required?
- How do I configure scheduling dependencies for a node?
- What rules are used when a node needs to depend on its ancestor nodes to run?
- What is the output name of a node used for?
- Can a node have multiple output names?
- Which scenarios do not support scheduling dependencies?
- Use scenarios of scheduling parameters
- When I commit a node, the system reports an error that the output name of the ancestor node of the node does not exist. What do I do?
- When I commit a node, the system reports an error that the input and output of the node are not consistent with the data lineage in the code developed for the node. What do I do?
- The system automatically adds an output name to Parent Nodes for my node based on the automatic parsing feature, but an error message indicating that the output represented by the output name does not exist appears. What do I do?
- The name and ID of the descendant node of my node are empty and cannot be specified in the output of my node. Why does this happen?
- How do I delete a table on which a node does not depend?
- How do I delete the tables on which my node does not depend?
- Can multiple nodes have the same output name?
- How do I prevent DataWorks from parsing temporary tables when DataWorks parses the scheduling dependencies of a node?
- How do I configure scheduling dependencies across workflows or configure scheduling dependencies across workspaces that reside in the same region?
- How do I configure an ancestor node for the start node of a workflow?
- How do I configure Node A, Node B, and Node C to run in sequence once per hour?
- Node deletion or changes
- Why do I find a non-existent output name of Node B when I enter an output name to search for the ancestor nodes of Node A?
- When I undeploy a node, the system displays an error message indicating that the node has descendant nodes and cannot be undeployed. However, no descendant nodes can be found for the node on the Properties tab. Why does this happen?
- Configuration of cross-cycle scheduling dependencies in different scenarios
- In which scenarios do I need to configure the instance of a node in the current cycle to depend on the instance of the node in the previous cycle?
- Why do some scheduling dependencies of nodes appear as dashed lines in Operation Center?
- How do I configure dependencies for a node that needs to depend on multiple nodes?
- I configure the instance generated for a node scheduled by hour in the current cycle to depend on the instance generated for the node in the previous cycle. What are the impacts on this node and its descendant node?
- How do I configure a scheduling dependency in which a node scheduled by day depends on a node scheduled by hour?
- When does a node scheduled by day start to run if I configure a node scheduled by hour as the ancestor node of the node scheduled by day?
- How do I configure a node scheduled by day to depend on a specific instance that is generated on the current day for a node scheduled by hour?
- How do I configure a node scheduled by day to depend on all the instances that are generated on the previous day instead of the current day for a node scheduled by hour?
- Node B scheduled by day depends on Node A scheduled by hour, and Node B starts to run only after all the instances that are generated on the current day for Node A are successful. Will the execution of Node B be affected if Node A still runs on the next day?
- Node A runs every hour on the hour, and Node B runs once every day. How do I configure Node B to automatically run after the first instance of Node A is run every day?
- Other frequently asked questions