All Products
Search
Document Center

MaxCompute:Project data protection

Last Updated:Oct 08, 2023

During business processing, a user may have the access permissions on multiple projects at the same time. In this case, security risks exist if data is transferred between the projects. To ensure project data security, MaxCompute provides the project data protection feature to control data outflow. This topic describes the project data protection feature of MaxCompute and the data outflow policies that can be used together with the project data protection feature.

Background information

Some enterprises have high data security requirements. They take various measures to prevent leaks of sensitive data. For example, employees can conduct their jobs only in the workplace and are not allowed to take work materials out of the office. All USB ports on office computers are disabled. The security issue that a user transfers data from a project to another project may occur.

For example, user Alice has the access permissions on Project1 and Project2. Alice has the Select permission on the table1 table in Project1 and the CreateTable permission on Project2. In this case, if the project data protection feature is disabled, Alice can execute the following statement to transfer data of the table1 from Project1 to Project2. This operation causes leak risks of sensitive data.

create table project2.table2 as select * from project1.table1;

The preceding example shows that a user who has the access permissions on multiple MaxCompute projects at the same time can transfer data between projects. MaxCompute provides the project data protection feature. If highly sensitive data exists in a project, the owner of the project must enable the project data protection feature.

By default, the project data protection feature is disabled for MaxCompute projects. The owner of a project can run the set projectProtection=true; command for the project to enable the project data protection feature. This way, data can only be flowed into the project and cannot flow out of the project.

Precautions

After you enable the project data protection feature, take note of the following points:

  • Cross-project data access operations cannot be performed because the operations violate project data protection rules. If you want to allow specific data to flow out to other projects, you can use one of the following methods that are provided by MaxCompute: Data outflow policy 1: Configure an exception policy and Data outflow policy 2: Configure a trusted project.

  • The project data protection feature controls data flows, but not access permissions on data. Data flow control is meaningful only if users have the access permissions on data.

  • In MaxCompute, the package-based resource sharing across projects feature is independent of the project data protection feature. The effect of these features is mutually exclusive. Package-based resource sharing takes precedence over project data protection. For example, if a user in Project B is granted the access permissions on an object in Project A by using package-based resource sharing, the project data protection feature does not take effect on the object.

Data outflow policy 1: Configure an exception policy

If cross-project data access operations are required, the project owner must configure an exception policy in advance to allow the specified user to transfer data of a specified object from the current project to a specified project. After the exception policy is configured, data of the object can be transferred to the specified project even if the project data protection feature is enabled.

Configure an exception policy

Configure an exception policy when you enable the project data protection feature. After you configure the exception policy, you can run the show SecurityConfiguration; command to view the information of the exception policy. Command for configuring an exception policy:

set ProjectProtection=true with exception <policyfile>;

policyfile indicates the file of the exception policy. You must save the file as a TXT file in the bin directory of the installation path of the MaxCompute client. The following example shows the content of the policyfile file:

    {
    "Version": "1",
    "Statement":
    [{
        "Effect":"Allow",
        "Principal":"<Principal>",
        "Action":["odps:<Action1>[, <Action2>, ...]"],
        "Resource":"acs:odps:*:<Resource>",
        "Condition":{
            "StringEquals": {
                "odps:TaskType":["<Tasktype>"]
            }
        }
    }]
    }

Parameter

Description

Effect

Specifies whether data is allowed to flow out. Set the value to Allow. This value indicates that data is allowed to flow out.

Principal

The Alibaba Cloud account or RAM user that is allowed to transfer data out of the project.

Action

The action that is allowed to transfer data out of the project. For more information about the actions that are performed on different types of objects, see MaxCompute permissions.

Resource

The object from which data is allowed to flow out and the project to which the object belongs. Format: projects/<project_name>/{tables|resources|functions|instances}/<name>. For more information about objects, see MaxCompute permissions.

Tasktype

The type of the job in which data is allowed to flow out. Valid values: DT, SQL, and MapReduce. DT indicates Tunnel.

Example

-- Enable the project data protection feature and configure an exception policy. The policy_file file is stored in the bin directory of the installation path of the MaxCompute client. 
set ProjectProtection=true with exception policy_file;  
-- The policy_file file contains the following content: 
{
    "Version": "1",
    "Statement":
    [{
        "Effect":"Allow",
        "Principal":"ALIYUN$Alice@aliyun.com",
        "Action":["odps:Select"],
        "Resource":"acs:odps:*:projects/project_test/tables/table_test",
        "Condition":{
            "StringEquals": {
                "odps:TaskType":["DT", "SQL"]
            }
        }
    }]
    }

The preceding example shows that Alice is allowed to perform the Select operation on the project_test.table_test table in an SQL job to transfer data from the project_test project to other projects.

Note

If Alice does not have the Select permission on the project_test.table_test table, Alice cannot transfer data out of the project even if the exception policy is configured.

Precautions

This method may cause data leaks due to a time-of-check to time-of-use (TOCTOU) error, which is also known as a race condition:

  • Problem description

    The following example describes how a user transfers data out of a project.

    1. TOC stage: User A submits an application to the project owner to export data in the t1 table. After the project owner verifies that the t1 table does not contain sensitive data, the project owner configures an exception policy to allow User A to export data in the t1 table.

    2. Before the TOU stage starts, other users write sensitive data to the t1 table.

    3. TOU stage: User A exports data in the t1 table. However, the data in the t1 table that is exported contains the sensitive data that is written by other users.

  • Solution

    To prevent this issue, we recommend that the project owner ensures that no other users can perform the Update action to update the table or perform the Drop and CreateTable actions to drop the table and create a table with the same name. In the preceding example, we recommend that the project owner creates a snapshot of the t1 table in the TOC stage, and then uses this snapshot to configure the exception policy. Additionally, make sure that no other users are assigned the Admin role.

Data outflow policy 2: Configure a trusted project

If the project data protection feature is enabled for the current project but data outflow is required, the project owner can configure the destination project as a trusted project of the current project. This way, data of the project can be transferred to the trusted project. If multiple projects are specified as trusted projects for each other, these projects form a trusted project group. Data of these projects can be transferred only within the group.

Manage trusted projects

The project owner can run the following commands to manage trusted projects:

  • To add a trusted project for the current project, run the following command:

     add trustedproject <project_name>;               
  • To remove a trusted project from the current project, run the following command:

     remove trustedproject <project_name>;            
  • To query all trusted projects of the current project, run the following command:

     list trustedprojects;                  

Best practices

To prevent data of a project from being transferred to other projects, the project owner must set projectProtection to true and perform the following operations to check configurations:

  • Run the list trustedprojects; command and make sure that no trusted projects are configured for the project. If a trusted project is configured for the project, the project owner must assess potential risks. The project owner can run the remove trustedproject <project_name>; command to remove the trusted project that is not needed.

  • Run the show packages; command and make sure that no data sharing packages are used in the project. If a data sharing package is used in the project, the project owner must ensure that the package does not contain sensitive data. The project owner can run the delete package <package_name>; command to delete the data sharing package that is not needed.

  • You can enable the project data protection feature for a project to prevent the data from being exported from the project. The use of the project data protection feature is similar to the use of the feature that is used to migrate data across projects. The following scenarios are supported:

    • Execute the statement create table <Table in another project> as select * from <Table in a protected project>.

    • Execute the statement insert overwrite table <Tables in another project> select * from <Tables in a protected project>.

    • Write table data to tables in another project by running Spark jobs, MapReduce jobs, Graph jobs, Proxima CE, and Machine Learning Platform for AI (PAI) jobs.

    • Run the Tunnel download command to download MaxCompute table data to your on-premises machine. You can call the SDK Tunnel or call the Tunnel by using a Java Database Connectivity (JDBC) driver.

    • Run the CLONE TABLE command to copy the table data to a table in another project.

    • Call a user-defined function (UDF) to write table data to tables in other projects.

    • Call a UDF to write table data to MaxCompute external tables.

  • The project data protection feature and the Download access control can be used for a project to download data. If you want to download data from a project, you must check whether you have the related permissions on the project.

    • If you enable the project data protection feature and the Download access control for a project, you can download data from the project when you have the Download and Describe permissions on the project. A Describe authentication is triggered before you can run the Tunnel download command.

    • If you enable the project data protection feature and disable the Download access control for a project, you can download data from the project when you have the Select permission on the project and configure the exception policy for the download behavior.

    • If you disable the project data protection feature and enable the Download access control for a project, you can download data from the project when you have the Download and Describe permissions on the project.

    • If you disable the project data protection feature and the Download access control for a project, you can download data from the project when you have the Select permission on the project.