DataStudio of DataWorks allows you to perform operations on a GUI to develop and test data in big data scenarios. This way, you can manage data in an intelligent and efficient manner. This topic describes the following aspects of DataStudio: nodes, supported node types, management and usage of resources during development, and management of permissions on resources and service modules during development.

Note
  • This topic describes how to use DataStudio in a workspace in standard mode. In standard mode, the development and production environments are isolated.
  • If you change the code for a node in the production environment, you must modify node parameters on the DataStudio page. Then, commit and deploy the node.
  • If no engine is available in your workspace or the engine that you want to use is not displayed in the directory tree, check whether the engine is activated and associated with your workspace on the Workspace Management page. Only the engines that are associated with a workspace are displayed in a workflow. For more information about how to associate an engine with a workspace, see Configure a workspace.
  • If you cannot perform operations on specific service modules or cannot find an add entry, go to the User Management page to check whether you have development permissions. You have development permissions if you use an Alibaba Cloud account or a RAM user that is assigned the developer role or workspace administrator role. You can also check whether the DataWorks edition meets your requirements.

Organizational structure

You can organize your business based on workspaces, solutions, and workflows. You can plan and group workspaces based on enterprise departments, business projects, and data warehouse layers.
Concept Description Purpose
Workspace You can specify administrators and members for each workspace based on your business requirements. The role settings of members and parameters for a compute engine instance are different among workspaces. For more information about workspace planning, see Plan workspaces. Workspaces are basic units for managing permissions in DataWorks. You can create workspaces based on the organizational structure of your company. You can use a workspace to manage development permissions and O&M permissions. Workspace members can collaborate to develop and manage the code for all nodes in a workspace.
Solution A solution is a group of workflows that are dedicated to a specific business goal. A workflow can be added to multiple solutions. After you develop a solution and add a workflow to the solution, other users can reference and modify the workflow in their solutions or workflows for collaborative development. You can use a solution for business integration.
Workflow A workflow is an abstract business entity that allows you to develop code based on your business requirements. Workflows and nodes in different workspaces are separately developed.
Workflows can be displayed in a directory tree or in a panel. The display modes enable you to organize code from the business perspective and show the resource classification and business logic in a more efficient manner.
  • The directory tree allows you to organize your code by node type.
  • The panel shows the business logic in a workflow.
A workflow is a basic unit for code development and resource management.
Organizational structureDataStudio works based on nodes in a workflow. You can create one or more nodes in a specific workflow in the panel. In each workflow, nodes are grouped by engine type. In the section of a specific engine, nodes are classified into data synchronization nodes, tables, resources, and functions. These components can be used to meet a specific business goal. Only the components that are used in a workflow are displayed in the workflow.
  • To use DataStudio, you must create a workflow.
  • If you change the code for a node in the production environment, you must modify node parameters on the DataStudio page. Then, commit and deploy the node.

Development logic

Workflows and nodes are required for data development. You can select a manually triggered node or an auto triggered node for data development. You can select engine nodes, control nodes, or custom nodes to cleanse data. If you select an auto triggered node, you must configure scheduling parameters and commit the node to the Create Package page. Then, deploy the node in the Nodes to Deploy panel. After the node is deployed, the node is in the production environment and is scheduled based on the scheduling parameters that you configure.

Main features of DataStudio

  • Node type
    Engine capabilities are encapsulated into DataWorks. You do not need to use complex engine CLIs. DataWorks provides custom wrappers for custom nodes. This allows you to add computing task types and use custom nodes to access custom computing services. You can use custom nodes with system nodes of DataWorks to process complex data.
    • Use data synchronization nodes in Data Integration to synchronize data.
    • Use engine nodes to develop data.
    • Use engine nodes and general nodes to manage complex processes.
    • Use custom nodes to develop data.
    For more information about how to select a node type, see the Select a data development node section in this topic.
  • Node development
    Allows you to select a manually triggered node or an auto triggered node and perform operations on a GUI in an intelligent and efficient manner to develop data.
    • Hybrid orchestration mode: allows you to drag different types of engine nodes in a workflow to the canvas and view the result. For more information, see Create an auto triggered node of the Select a node type section in this topic.
    • AI-powered SQL editor: supports code hinting and displays the structure of SQL operators. For more information, see View node code and manage node versions of the Select a node type section in this topic.
    For more information about how to create an auto triggered node and a manually triggered node, see Select a node type.
  • Visualized management and use of tables, resources, and functions

    For more information, see the Manage and use tables, resources, and functions section in this topic.

  • Permission and development behavior management of members
    The permissions on the following items are managed in DataStudio:
    • Resources
    • GUI-based operations
    • Operation procedure
    For more information, see the Manage the permissions and development behavior of members section in this topic.
  • Code version management and operational audit
    You can use ActionTrail in the following scenarios:
    • Obtain the audit logs of operations that are performed by developers on a GUI.
    • Configure parameters in advance to collect important data for cause analysis.
    • Audit permissions on a MaxCompute table.
    • Restore table data and nodes.
    • Compare and roll back node versions.
    For more information, see the ActionTrail section in this topic.

Select a data development node

Engine capabilities are encapsulated as engine nodes in DataWorks. This way, you do not need to use complex engine CLIs. DataWorks provides general nodes that you can use together with engine nodes to manage complex processes. DataWorks also provides custom wrappers for custom nodes. This allows you to add computing task types and use custom nodes to access custom computing services. You can use custom nodes to customize code processing methods.
Note More features will be available in the future.
  • Data synchronization node in Data Integration: You can use a data synchronization node in Data Integration to synchronize data.
    Data synchronization node in Data Integration Scenario
    Batch synchronization node A batch synchronization node is used for offline data synchronization.
    • Batch synchronization nodes support different types of heterogeneous data sources in complex scenarios. These nodes are used to synchronize data based on a data transmission framework by using a reader and a writer, which are abstract data extraction and writing plug-ins.
    • A batch synchronization node supports more than 40 data sources of the following categories: relational databases, unstructured storage, big data storage, and message queues.
    For more information about the data sources that are supported by a batch synchronization node, see Supported data sources, readers, and writers.
    Real-time synchronization node A real-time synchronization node for real-time data synchronization.

    A real-time synchronization node uses three basic plug-ins to read, convert, and write data. These plug-ins interact with each other based on an intermediate data format that is defined by the plug-ins.

    For more information about the data sources that are supported by a real-time synchronization node, see Plug-ins for data sources that support real-time synchronization.
    Data synchronization solution DataWorks provides solutions for various data synchronization scenarios, such as real-time synchronization, offline full synchronization, and offline incremental synchronization. These solutions help enterprises migrate data to the cloud in a more efficient and convenient manner.
    Synchronization solutions provide the following benefits:
    • Initializes full data.
    • Writes incremental data in real time.
    • Automatically merges full and incremental data at a scheduled time and writes the data to the new partition of a full table.
    For more information about synchronization solutions, see Overview.
  • Engine node: You can use an engine node to develop data.
  • General node: You can use general nodes together with engine nodes to manage complex processes.
    You can create a general node and use the node together with engine nodes to manage complex processes in a workflow.
    Scenario Node type Description
    Business management Create a zero-load node A zero load node is a control node that supports dry-run scheduling and does not generate data. In most cases, a zero load node serves as the root node of a workflow and allows you to manage nodes and workflows.
    Event triggering Create an HTTP Trigger node Use this type of node if you want to schedule nodes in DataWorks after nodes in other systems finish running.
    OSS Object Inspection node Use this type of node if you want to trigger a descendant node to run by monitoring whether Object Storage Service (OSS) objects are generated.
    Create an FTP Check node Use this type of node if you want to trigger a descendant node to run by monitoring whether File Transfer Protocol (FTP) files are generated.
    Parameter value assignment Configure an assignment node Use this type of node if you want to use the outputs parameter of an assignment node to pass the data from the output of the last row of the code for the assignment node to its descendant nodes.
    Control Configure a for-each node Use this type of node to traverse the result set of an assignment node.
    Configure a do-while node Use this type of node to execute logic of specific nodes in loops. You can also use this type of node together with an assignment node to generate the data that is passed to a descendant node of the assignment node in loops.
    Configure a branch node Use this type of node to route results based on logical conditions. You can also use this type of node together with an assignment node.
    Configure a merge node Use this type of node to merge the status of its ancestor nodes and prevent dry-run of its descendant nodes.
    Parameter passing Create a parameter node Use this type of node to aggregate parameters of its ancestor nodes and distribute parameters to its descendant nodes.
    Create a Shell node Shell nodes support standard shell syntax. The interactive syntax is not supported.
    Code reuse Create an SQL component node An SQL component node is an SQL script template that contains multiple input and output parameters. You can create and run an SQL component node to filter source table data, join source tables, and aggregate source tables to generate a result table.
    Note Only the MaxCompute SQL syntax is supported.
  • Custom node: You can use custom nodes to develop data.

    DataWorks provides custom wrappers for custom nodes. This allows you to add computing task types and use custom nodes to access custom computing services. You can write code for a custom wrapper on your on-premises machine and add the code to DataWorks on the node configuration page. When you use DataStudio, you can select the custom wrapper from the custom code group of a workflow to develop data.

    The following table describes how to use a custom node.
    Procedure Description
    Step 1: Develop a custom wrapper package To run a task by using a custom node, you must use a custom wrapper. Before you can use a custom node, you must create a custom wrapper package. Then, upload and deploy the package to DataWorks.
    Step 2: Create a wrapper Deploy the custom wrapper in DataWorks.
    Step 3: Create a custom node type Add a custom node and configure the relationship between the custom node and the custom wrapper. Then, configure the basic information, code editor, and interaction parameters of the custom node.

Select a node type

In DataWorks, you can create auto triggered nodes or manually triggered nodes to develop data. To create a node, you can right-click a workflow in the directory tree on the left, or double-click a workflow and drag a specific node type to the canvas.
  • Create an auto triggered node
    1. Create an auto triggered node
      Find a type of node and create a node. You can configure dependencies between nodes in a workflow by using a directed acyclic graph (DAG) and drag components to the canvas to orchestrate the workflow. You can configure node dependencies across workflows and workspaces by using the automatic parsing feature. Periodic schedulingThe following table describes the additional features that can be used to improve the efficiency of data development.
      Additional feature Description
      Create and reference a node group Groups several nodes that are frequently reused in a workflow as a node group and references the node group in other workflows.
      Workflow parameters Assigns a value to a variable or replaces the value of a parameter for all nodes in a workflow.
      Change History Displays the operation records on a workflow.
      Versions An updated version is generated each time you commit the same workflow. On the Versions tab, you can view and compare the committed versions.
      Code search Allows you to search for a node by entering code snippet keywords and displays all the nodes that contain the code snippet and details about the code snippet. You can use this feature to search for the node that causes data changes in the destination table.
    2. Configure scheduling parameters for the auto triggered node
      In DataWorks, you can schedule an auto triggered node in different time granularities, such as minute, hour, day, month, or year. DataWorks allows you to run tens of millions of auto triggered nodes on a daily basis and supports parameter passing between nodes. For more information, see Basic properties. Properties
      The following tables describe major scheduling parameters.
      Note On the Batch Operation-Data development tab, you can perform an operation on multiple nodes, resources, or functions at the same time. For more information, see the "Batch operations" section.
      • Configure basic properties
        Parameter Description
        Parameters Assign values to variables in code. You can add scheduling parameters in the Parameters section. This way, you can dynamically assign values to variables. The values of the added parameters are replaced with the time points at which nodes are scheduled. For more information, see Overview of scheduling parameters.
      • Configure time properties
        Parameter Description
        Instance Generation Mode Specifies the time at which an auto triggered node is scheduled. You can view the auto triggered instances of the node on the Cycle Instance page.
        • Next Day: On the next day after an auto triggered node is deployed to the production environment, the auto triggered instances of the node are generated and are run as scheduled.
        • Immediately After Deployment: Auto triggered instances are generated immediately after an auto triggered node is deployed to the production environment. You must set the node scheduled time based on the specified requirement. For more information, see Configure immediate instance generation for a node.
        Recurrence Specifies whether an auto triggered node is actually run in a specific scheduling scenario and the impact of each scheduling type on descendant nodes.
        • Normal
          • Description: An auto triggered node is scheduled normally. The descendant nodes of this node are also scheduled normally.
          • Scenario: By default, the Recurrence parameter is set to Normal.
        • Skip Execution
          • Description: An auto triggered node is frozen. However, the instances of this node are not frozen. This auto triggered node cannot be run and the descendant nodes are blocked from running.
          • Scenario: If you do not need to run a workflow within a specified period of time, you can set the Recurrence parameter to Skip Execution. This allows you to freeze the root node of the workflow.
        • Dry Run
          • Description: The scheduling system does not run a node or generate running logs. Instead, the system directly returns a success message for the node and the node does not consume resources. The descendant nodes of the node can be run as scheduled.
          • Scenario: If you do not need to run a node within a specified period of time but you need to run the descendant nodes as scheduled, you can set the Recurrence parameter to Dry Run for this node.
        Rerun Specifies whether to rerun a node based on data idempotence.
        • Allow Regardless of Running Status
        • Allow upon Failure Only
        • Disallow Regardless of Running Status
        Auto Rerun upon Error Specifies the number of automatic reruns upon an error and rerun interval for a node in a specific scheduling scenario.
        Validity Period Specifies the time period within which a node is automatically rerun. The node is not automatically scheduled outside the specified time period and no auto triggered instances are generated.
        Scheduling Cycle and Run At Valid values: Minute, Hour, Day, Week, Month, and Year.
        Note The system generates dry-run instances for an auto triggered node outside the specified time period.
        Timeout definition Specifies the timeout duration. If a node is run for a period of time that exceeds the specified duration, the node automatically terminates and exits.
      • Configure a resource group: specifies the resource group for node scheduling.
      • Configure same-cycle scheduling dependencies
        Parameter Description
        Same Cycle Specifies the nodes that trigger the current node to run. The node depends on the same-cycle auto triggered instances that are scheduled to run on the current day for the ancestor nodes of this node. From the business perspective, the node is run based on the table data that is generated by its ancestor nodes on the current day.
        Previous Cycle Specifies the nodes that trigger the current node to run. The node depends on the previous-cycle auto triggered instances that are scheduled to run on the previous day for the ancestor nodes of the current node. From the business perspective, the node depends on the table data that is generated by its ancestor nodes on the previous day.
      • Configure context-based parameters: This feature is used together with an assignment node. You can configure Input Parameters and Output Parameters in the Parameters section to pass the result set of the assignment node to its descendant nodes.
    3. View node code and manage node versions
      Feature Description Screenshot
      Code editing The AI-powered SQL editor supports code hinting. Code editing
      Lineage Displays node dependencies and node code lineage. Lineage
      Versions A new version is generated each time you commit the same node. On the Versions tab, you can compare and roll back the committed versions. Versions
      Code Structure Uses SQL operators to display the code structure. Code Structure
  • Create a manually triggered node
    In the left-side navigation pane, click Manually Triggered Workflows. On the page that appears, double-click Manually Triggered Workflows to create a workflow. Find the type of the node that you want to create and create a node in the workflow. You can configure dependencies between nodes in the workflow by using DAGs. You can drag components to the canvas and draw lines to connect the nodes in the workflow. Manually triggered nodes
  • Commit a node
    After you commit a node, the operation record is displayed on the Create Package page. You can determine whether to deploy the operation in the Nodes to Deploy panel. The node can be scheduled in the production environment only after you deploy the operation. Commit a node
  • Deploy a node
    The Create Package page displays all the operations to be deployed in a workspace, including the records of add, update, and undeploy operations. A node can be scheduled as expected in the production environment only after you deploy the operation in the Nodes to Deploy panel. Deploy a node
    Note In a workspace in simple mode, you can use the cross-project cloning feature to deploy code in the workspace to another workspace.

Manage and use tables, resources, and functions

DataWorks encapsulates tables, resources, and functions of an engine. You can create tables and resources, and register functions in a visualized manner.
  • Manage tables

    You can manage tables in a visualized manner, upload data of a table from your on-premises machine, and export table data.

    DataWorks provides multiple entries to help you manage and use tables. You can select an entry based on your business requirements.

    Entry Scheduled Workflow Workspace Tables Tenant Tables
    Entry Screenshot Entry 1 Entry 2 Entry 3
    Operation description Use workflows on the Scheduled Workflow page to manage tables based on business workflows.
    • Import tables to a workflow: Use this feature to import tables that are required for a workflow to the workflow. This way, you can manage the tables based on your business requirements.
    • Delete tables from a workflow: Use this feature to delete tables that are no longer required for a workflow. You can right-click the table that you want to delete and click Delete Table.
      Note The table is deleted only from the workflow. It is still retained in the engine.
    None.
    Note You can view the basic metadata, lineage, and impact of a table in Data Map. For more information, see Overview.
    The following table describes the items that you must take note of when you manage tables by using different entries.
    Operation category Operation Scheduled Workflow Workspace Tables
    Manage tables Basic table operations:
    • Create a table
    • Delete a table in the development environment
    • Rename a table
    • Modify the comment of a table
    • Add a field
    • ...
    The table operations are the same as the operations on an engine. Scheduled Workflow The table operations are the same as the operations on an engine. Workspace Tables
    • Delete a table in the production environment
    • Change multiple table owners at the same time
    • Change multiple table lifecycles at the same time
    • Change the display names of multiple tables at the same time
    You cannot directly perform the four operations on a table in the production environment on the Scheduled Workflow or Workspace Tables page. You must perform the operations in Data Map. For more information, see Overview. Data Map
    Import table data Upload data from your on-premises machine to a table Scheduled Workflow Workspace Tables
    Synchronize data from other data sources to a table Scheduled WorkflowIf you want to synchronize data from other data sources to a table, you can use batch synchronization nodes and real-time synchronization nodes. Not supported.
    Export table data Export data to your on-premises machine Export data to your on-premises machine
    Note An administrator can specify the Download SELECT Query Result parameter.
    Export data to other data sources Scheduled WorkflowIf you want to synchronize data to other data sources, you can use batch synchronization nodes and real-time synchronization nodes. Not supported.
  • Manage and use resources
    • Manage resources
      Visualized upload Resource version management
      Upload resources Manage resource versions
    • Use resources
      Use resources in a node Use resources to register functions
      Use resources in a node Register functions
    • Manage and use functions
      You can register a function on the Scheduled Workflow page. Register functions

Manage the permissions and development behavior of members

The permissions in DataStudio are classified into the permissions on engines and the permissions on service modules.
  • Manage the permissions on engines:
    After you associate an engine with a workspace, you can manage the permissions on the engine when you develop data in DataWorks.
    Notice You are automatically granted the permissions on the engine with which the workspace is associated. You may encounter the following scenarios when you manage the permissions on engines:
    • If you associate an engine with a workspace, a built-in role is granted the permissions on tables, functions, and resources of some engines.
    • If no built-in role is assigned or a built-in role is not granted the required permissions on a specific engine, you must go to the permission granting page of the specified engine to grant the required permissions. You are not allowed to directly grant the permissions in DataWorks.
  • Manage permissions on service modules:

    You can manage the permissions on service modules of DataWorks when you perform data development operations that are not related to engines.

Development behavior management: DataWorks provides a permission management capability that can help you identify and block sensitive behavior at the earliest opportunity. If sensitive behavior is detected, you can manually block the behavior or DataWorks can automatically notify you of the sensitive behavior based on the custom event check logic. However, DataWorks does not block operation procedures.

  • Manage permissions on MaxCompute
    If the MaxCompute engine is associated with a workspace in standard mode:
    • The DataWorks built-in roles and the roles in a MaxCompute project in the development environment have a permission mapping. By default, a DataWorks built-in role has all the permissions that the mapped MaxCompute project role has on the MaxCompute engine in the development environment.
    • The DataWorks built-in roles and the roles in a MaxCompute project in the production environment do not have a permission mapping. A DataWorks built-in role cannot directly manage resources of a MaxCompute project in the production environment.
    In summary, after a RAM user of DataWorks is assigned the administrator or developer role, the RAM user has all the permissions on a MaxCompute project in the development environment. However, the RAM user does not have the permissions on the same MaxCompute project in the production environment. If you want to use the RAM user to access a table in the production environment from the development environment, you must apply for the operation permissions on the table for the RAM user in Data Map. For more information, see Data Map.
    You can compile and debug code on the DataStudio page, commit and deploy the code to the production environment, and then run nodes in the production environment on the Operation Center page.
    Operation page Access a table in the development environment Access a table in the production environment
    DataStudio
    • Sample code:
      select col1 from tablename
    • Operation result description: You can use an Alibaba Cloud account or a RAM user to access a table in the development environment. Specify the table name in the projectname_dev.tablename format.
    • Sample code:
      select col1 from projectname.tablename
    • Operation description: You can use an Alibaba Cloud account or a RAM user to access a table in the production environment. Specify the table name in the projectname.tablename format.
    Operation Center Not supported.
    • Sample code:
      select col1 from tablename
    • Operation description: You can use an Alibaba Cloud account or a RAM user that you select when you associate an engine instance with a workspace to access a table in the production environment. Specify the table name in the projectname.tablename format.
  • Manage permissions on E-MapReduce (EMR)
    DataWorks built-in roles and EMR roles do not have a permission mapping. When you associate an EMR cluster with a workspace, you can set Access Mode to Shortcut Mode or Security Mode. The association settings that you must configure and the operation permissions that you can have vary based on the access mode that you select. For more information, see Associate an EMR cluster with a DataWorks workspace.
    • Shortcut Mode
      Operation page Access a table in the development environment Access a table in the production environment
      DataStudio and Operation Center Perform operations as the hadoop user.
    • Security Mode
      Operation page Access a table in the development environment Access a table in the production environment How it works
      DataStudio You can use the account that you selected in the Development Environment section when you associate the EMR cluster with the workspace to access all engine resources. Not supported. You can configure the Lightweight Directory Access Protocol (LDAP) permission mapping for members in a DataWorks workspace to manage the permissions of a RAM user on EMR clusters when the RAM user uses DataWorks.

      If you use an Alibaba Cloud account or a RAM user to commit code in DataWorks, the account that has the same username in EMR is used to run the node. You can use EMR Ranger to manage the permissions of users in an EMR cluster. This way, Alibaba Cloud accounts, node owners, and RAM users have different data permissions when they run EMR nodes in DataWorks.

      Operation Center Not supported. You can use the account that you selected in the Production Environment section when you associate the EMR cluster with the workspace to access all engine resources.
  • Manage permissions on other engines

    If engines except MaxCompute and EMR are associated with a workspace, whether you have permissions to run nodes on the DataStudio page depends on the account that you selected when you configure the engine.

  • Manage permissions on service modules
    • Manage permissions on service modules

      DataWorks allows you to create a custom role, and then grant the read/write permissions on specific service modules to this role. For more information, see Manage workspace-level roles and members.

    • Manage the permissions on features of a service module

      In DataWorks, if some features are dimmed or the entry of a feature is not found, check whether you have the required permissions. DataWorks built-in roles have different permissions. For more information, see Permissions of built-in workspace-level roles.

    • Manage permissions on operation procedures

      DataWorks provides a permission management capability that can detect and block sensitive behavior at the earliest opportunity. If sensitive behavior is detected, you can manually block the behavior or DataWorks can automatically notify you of the sensitive behavior based on the custom event check logic. However, DataWorks does not block operation procedures.

      DataWorks provides the code review feature. This feature includes the forcible code review (block operation procedures) and message sending (notification without blocking operation procedures) sub-features. You can enable Force to review code for baseline tasks of the specified priority based on node importance (the baseline to which nodes belong). You can open data by opening relevant interfaces or features. This way, you can identify the core changes and take relevant measures at the earliest opportunity. You can enable the message opening feature to subscribe to the status changes of database tables and nodes in DataWorks and achieve personalized and automated responses.

ActionTrail

  • Restore nodes and table data
    • Restore a node: You can restore nodes that are recently deleted. For more information, see Recycle bin. New node IDs are generated after you restore deleted nodes.
    • Restore data of MaxCompute tables: DataWorks provides the data backup and restoration feature. The system automatically backs up data of earlier versions, including deleted data or data that exists before an update, and retains the data for a period of time. For more information, see Backup and restoration.
  • Compare and roll back node versions
    On the DataStudio page, find the node whose versions you want to view and click Versions on the right side of the node configuration page. On the Versions tab, you can select the nodes and compare the node versions. You can also roll back node versions. For more information, see Versions. Versions
  • Obtain audit logs of operations that are performed in the DataWorks console, such as download data

    DataWorks is integrated with ActionTrail. You can view and retrieve DataWorks behavioral events of your Alibaba Cloud account over the previous 90 days in the ActionTrail console. You can use ActionTrail to deliver the event logs to a Logstore in Log Service or an Object Storage Service (OSS) bucket for monitoring and alerting. This way, you can audit the event logs and track and analyze issues at the earliest opportunity. For more information, see Use ActionTrail to query behavior events.

  • Mask data and trace leaked data

    If your files are important, you can configure data masking rules for sensitive data to prevent file data leak and trace the leaked data by using the data watermark feature in Data Security Guard. For more information, see Create de-identification rules.

  • Audit the permissions on MaxCompute tables

    Go to the Security Center page and click Data access control. On the Data access control page, click Permission audit. On the tab that appears, you can view the owner IDs that have permissions on tables, the details about the permissions, and the validity period of the permissions. You can also revoke the permissions on tables.