This topic provides answers to some frequently asked questions about DataStudio.
- Resources
- Which kind of resource group can I use when I reference a third-party package in a PyODPS node?
- How do I reference a resource in a node?
- How do I download a resource that is uploaded to DataWorks?
- How do I upload a resource whose size is greater than 30 MB?
- How do I use a resource that is uploaded to DataWorks by using odpscmd?
- How do I upload a JAR package on my on-premises machine to DataWorks as a JAR resource and reference the uploaded resource in a node?
- How do I use a MaxCompute table in DataWorks?
- PyODPS
- Can a Python resource call another Python resource?
- Can PyODPS call custom functions to use third-party packages?
- When I call a pickle file in a PyODPS 3 node, the following error message appears:
_pickle.UnpicklingError: invalid load key, '\xef.
. What do I do? - How do I delete a MaxCompute resource?
- Nodes and workflows
- How do I recover a node that is deleted?
- What is the impact on the instances of a node after the node is deleted?
- How do I view the versions of a node?
- How do I check whether a node is committed?
- After a node is modified and committed and deployed to the production environment, is the existing faulty node in the production environment overwritten?
- How do I export the code of a node?
- Can I configure properties for all nodes in a workflow at a time?
- How do I clone a workflow?
- Tables
- How do I create a table in a visualized manner?
- How do I add fields to a table that is in the production environment?
- How do I delete a table?
- How do I upload data from my on-premises machine to a MaxCompute table?
- When I create a table in a workspace with which an E-MapReduce (EMR) compute engine instance is associated, the following error message appears: call emr exception. What do I do?
- How do I query data that is in the production environment from the development environment on the DataStudio page?
- How do I control whether the queried table data can be downloaded?
- How do I download more than 10,000 data records?
- When I create a table in a workspace with which an E-MapReduce (EMR) compute engine instance is associated, the following error message appears: call emr exception. What do I do?
- Operational logs and retention period of operational logs
- Batch operations
- Power BI connection to MaxCompute
What do I do if an error is reported when I connect Power BI to MaxCompute?
- API calls
- Other items
Which kind of resource group can I use when I reference a third-party package in a PyODPS node?
Use an exclusive resource group for scheduling. For more information, see Use a PyODPS node to reference a third-party package.
How do I control whether the queried table data can be downloaded?


You can download only a maximum of 10,000 data records from DataStudio due to the limits of the compute engine.
How do I download more than 10,000 data records?
Use a Tunnel command of MaxCompute. For more information, see Use SQLTask and Tunnel to export a large amount of data.
When I create a table in a workspace with which an E-MapReduce (EMR) compute engine instance is associated, the following error message appears: call emr exception. What do I do?
- Possible cause: Security settings are not configured for the security group to which your EMR cluster belongs. Before you associate an EMR compute engine instance with your workspace, add the following rules to the security group of the ECS instance that hosts your EMR cluster. Otherwise, the preceding error message may appear.
- Solution: Check the security settings of the security group of the ECS instance that hosts your EMR cluster. If the security settings do not include the preceding rules, add the rules to the security group.
How do I reference a resource in a node?

How do I download a resource that is uploaded to DataWorks?

How do I upload a resource whose size is greater than 30 MB?
Use a Tunnel command to upload the resource. Then, add the resource to DataStudio in the MaxCompute Resources pane for future use. For more information, see How do I use a resource that is uploaded to DataWorks by using odpscmd? .
How do I use a resource that is uploaded to DataWorks by using odpscmd?

How do I upload a JAR package on my on-premises machine to DataWorks as a JAR resource and reference the uploaded resource in a node?

For example, you want to reference the resource test.jar in a Shell node. After you
select Insert Resource Path, the comment ##@resource_reference{"test.jar"}
is automatically added at the beginning of the code for the Shell node.
How do I use a MaxCompute table in DataWorks?
- On the DataStudio page of DataWorks, create a file resource that has the same name
as the MaxCompute table and upload the file. In this example, the userlog3.txt file
is uploaded.
Note
Do not select Upload to MaxCompute.
- After you upload the file, execute a statement on odpscmd to add the MaxCompute table
resource to DataWorks. In this example, the statement
add table userlog3 -f;
is executed. - Select the uploaded file resource to use the resource.
Can a Python resource call another Python resource?
A Python resource can call another Python resource in the same workspace.
Can PyODPS call custom functions to use third-party packages?
If you do not want to use the map method of DataFrame to call the test function, you can use PyODPS to call custom functions to use third-party packages. For more information, see Reference a third-party package in a PyODPS node.
When I call a pickle file in a PyODPS 3 node, the following error message appears:
_pickle.UnpicklingError: invalid load key, '\xef.
. What do I do?
Check whether the code of your PyODPS 3 node contains special characters. If the code contains special characters, compress the code into a ZIP package, upload the package to DataWorks, and then decompress the package to call the pickle file.
How do I delete a MaxCompute resource?
In a DataWorks workspace in standard mode, the development environment is isolated from the production environment. If you delete a resource on the DataStudio page, the resource is deleted only from the development environment. The same resource is deleted from the production environment only after you deploy the delete operation to the production environment.
- Delete a resource from the development environment. In the desired workflow, choose
Delete. In the Delete dialog box, click OK.
- Delete a resource from the production environment. A resource can be deleted from
the production environment only after the delete operation of the resource is deployed
to the production environment. On the DataStudio page, click Deploy in the upper-right corner. On the Create Deploy Task page, set Change Type to Offline,
find the package of the resource that is deleted in the previous step, and click Deploy in the Actions column. In the Create Deploy Task dialog box, click Deploy.
After you click Deploy, the resource is deleted from the production environment.
How do I recover a node that is deleted?

How do I view the versions of a node?
A version is generated only after you commit the code.

How do I clone a workflow?
Use a node group. For more information, see Create and reference a node group.
How do I export the code of a node?
Use Migration Assistant. For more information, see Overview.
How do I check whether a node is committed?
If you want to check whether a node is committed, find the desired workflow in the
Scheduled Workflow pane and expand the workflow to view the status of each node in this workflow. If
the icon is displayed on the left side of a node, the node is committed. Otherwise, the
node is not committed.
Can I configure properties for all nodes in a workflow at a time?
No, you cannot configure properties for all nodes in a workflow. In DataWorks, you are not allowed to configure properties for a workflow. If a workflow contains multiple nodes, you must configure properties for the nodes one by one. For example, if a workflow contains 20 nodes, you must configure properties for these nodes one by one.
What is the impact on the instances of a node after the node is deleted?
The scheduling system generates one or more instances for a node every day based on the time properties of the node. If the node is deleted after it is run for a period of time, its instances are retained. However, the instances will fail to run after the node is deleted. This is because the required code is unavailable.
After a node is modified and committed and deployed to the production environment, is the existing faulty node in the production environment overwritten?
No, the existing faulty node is not overwritten. The updated code is used to run new node instances that are not run, and the existing node instances are retained. If scheduling properties are modified, the modified configurations apply only to the new node instances.
How do I create a table in a visualized manner?

How do I add fields to a table that is in the production environment?
If you use an Alibaba Cloud account, add fields to the table in the Workspace Tables pane of the DataStudio page and commit the table to the production environment.
If you use a RAM user, you must request the permissions of the O&M engineer or workspace administrator role for the RAM user, use the RAM user to add fields to the table in the Workspace Tables pane of the DataStudio page, and then commit the table to the production environment.
How do I delete a table?
You can delete a table from the development environment on the DataStudio page.
- Go to Data Map and delete the table on the My Data tab.
- Create an ODPS SQL node, and enter and execute the DROP statement on the configuration tab of the node. For more information about how to create an ODPS SQL node, see Create an ODPS SQL node. For more information about the syntax of the DROP statement, see Table operations.

How do I upload data from my on-premises machine to a MaxCompute table?

When I create a table in a workspace with which an E-MapReduce (EMR) compute engine
instance is associated, the following error message appears: call emr exception
. What do I do?
- Possible cause:
Security settings are not configured for the security group to which your EMR cluster belongs. Before you associate an EMR compute engine instance with your workspace, add the following rules to the security group of the ECS instance that hosts your EMR cluster. Otherwise, the preceding error message may appear.
- Action: Allow
- Protocol type: Custom TCP
- Port range: 8898/8898
- Authorization object: 100.104.0.0/16
- Solution:
Check the security settings of the security group of the ECS instance that hosts your EMR cluster. If the security settings do not include the preceding rules, add the rules to the security group.
How do I query data that is in the production environment from the development environment on the DataStudio page?
In a workspace in standard mode, if you want to query data that is in the production environment from the development environment on the DataStudio page, specify the table whose data you want to query in the Project name.Table name format.
In a workspace that is upgraded from the basic mode to the standard mode, if you want to query data that is in the production environment from the development environment on the DataStudio page, you must request the permissions of the producer role first and specify the table whose data you want to query in the Project name.Table name format. For more information about how to request the permissions, see Request permissions on tables.
How do I query historical operational logs on the DataStudio page?
Click the Operational history icon in the left-side navigation pane of the DataStudio page. In the Operational history pane, you can view the historical operational logs.
How long are operational logs on the DataStudio page retained?
How do I perform operations on multiple nodes, resources, or functions at a time?
Go to the DataStudio page and click the Batch Operation icon in the Scheduled Workflow pane. On the Batch Operation-Data Development tab, you can perform the desired operation on multiple nodes, resources, or functions at a time. Then, you can commit the objects on which you perform the operation at a time and deploy the objects on the Create Deploy Task page to make the modifications take effect.

How do I change resource groups for scheduling for multiple nodes in a workflow at a time on the DataStudio page?

What do I do if an error is reported when I connect Power BI to MaxCompute?
MaxCompute cannot be connected to Power BI. We recommend that you connect Hologres instead of Power BI to MaxCompute. For more information, see Endpoints for connecting to Hologres.
When I call a DataWorks API operation, the following error message appears: access is forbidden. Please first activate DataWorks Enterprise Edition or Flagship
Edition
. What do I do?
Activate DataWorks Enterprise Edition. For more information, see Overview.
How do I disable the MaxCompute Query Acceleration (MCQA) feature if I want to obtain the instance ID that is used to download more than 10,000 data records?
Add set odps.mcqa.disable=true;
to the code of the ODPS SQL node and execute this statement together with other SELECT
statements.