All Products
Search
Document Center

MaxCompute:MaxCompute security white paper

Last Updated:Sep 15, 2023

Legal disclaimer

Alibaba Cloud reminds you to carefully read and fully understand the terms and conditions of this legal disclaimer before you read or use this document. If you have read or used this document, it shall be deemed as your total acceptance of this legal disclaimer.

  1. You shall download and obtain this document from the Alibaba Cloud website or other channels that are authorized by Alibaba Cloud, and use this document for your own legal business activities only. The content of this document is considered confidential information of Alibaba Cloud. You shall strictly abide by the confidentiality obligations. No part of this document shall be disclosed or provided to any third party for use without the prior written consent of Alibaba Cloud.

  2. No part of this document shall be excerpted, translated, reproduced, transmitted, or disseminated by any organization, company, or individual in any form or by any means without the prior written consent of Alibaba Cloud.

  3. The content of this document may be changed due to product version upgrades, adjustments, or other reasons. Alibaba Cloud reserves the right to modify the content of this document without notice. The updated versions of this document will be occasionally released through channels that are authorized by Alibaba Cloud. You shall pay attention to the version changes of this document as they occur and download and obtain the most up-to-date version of this document from channels that are authorized by Alibaba Cloud.

  4. This document serves only as a reference guide for your use of Alibaba Cloud products and services. Alibaba Cloud provides the document in the context that Alibaba Cloud products and services are provided on an as is, with all faults, and as available basis. Alibaba Cloud makes every effort to provide relevant operational guidance based on existing technologies. However, Alibaba Cloud hereby makes a clear statement that it in no way guarantees the accuracy, integrity, applicability, and reliability of the content of this document, either explicitly or implicitly. Alibaba Cloud shall not bear any liability for any errors or financial losses incurred by any organizations, companies, or individuals arising from their download, use, or trust in this document. Alibaba Cloud shall not, under any circumstances, bear responsibility for any indirect, consequential, exemplary, incidental, special, or punitive damages, including lost profits arising from the use or trust in this document, even if Alibaba Cloud has been notified of the possibility of such a loss.

  5. By law, all the content of the Alibaba Cloud website, including but not limited to works, products, images, archives, information, materials, website architecture, website architecture, website graphic layout, and webpage design, are intellectual property of Alibaba Cloud and/or its affiliates. This intellectual property includes, but is not limited to, trademark rights, patent rights, copyrights, and trade secrets. No part of the Alibaba Cloud website, product programs, or content shall be used, modified, reproduced, publicly transmitted, changed, disseminated, distributed, or published without the prior written consent of Alibaba Cloud and/or its affiliates. The names owned by Alibaba Cloud include, but are not limited to, Alibaba Cloud, Aliyun, HiChina, and other brands of Alibaba Cloud and/or its affiliates, which appear separately or in combination, as well as the auxiliary signs and patterns of the preceding brands, or anything similar to the company names, trade names, trademarks, product or service names, domain names, patterns, logos, marks, signs, or special descriptions that third parties identify as Alibaba Cloud and/or its affiliates.

  6. Please contact Alibaba Cloud directly if you discover any errors in this document.

Platform security

Account

You must use an Alibaba Cloud account to purchase and use MaxCompute. After you create an Alibaba Cloud account and complete real-name verification, you can use the Alibaba Cloud account to purchase MaxCompute resources and create AccessKey pairs for accessing MaxCompute projects. You can access a MaxCompute project by using an Alibaba Cloud account, a RAM user, or a RAM role.

MaxCompute supports RAM authorization. Resource Access Management (RAM) is an Alibaba Cloud service that allows you to manage access permissions on resources. You can use your Alibaba Cloud account to create RAM users and grant the RAM users permissions to access specific resources within your Alibaba Cloud account.

Authentication

You can create an AccessKey pair in the Alibaba Cloud console. An AccessKey pair consists of an AccessKey ID and an AccessKey secret. The AccessKey ID is public and uniquely identifies a user, whereas the AccessKey secret is private and used to authenticate a user. Before a client sends a request to MaxCompute, the client generates a string to be signed in the format that is specified by MaxCompute and then generates a signature for the request by using the AccessKey secret.

After MaxCompute receives the request, MaxCompute finds the AccessKey secret based on the AccessKey ID and then generates a signature. If the signature is the same as the signature that is sent by the client, the request is valid. Otherwise, MaxCompute rejects the request and returns an HTTP 403 error.

Authorization

You can use an Alibaba Cloud account or a RAM user to access MaxCompute resources. You can create different RAM users in one Alibaba Cloud account. MaxCompute checks the permissions of your Alibaba Cloud account or RAM user each time you access the resources of MaxCompute.

  • If you access a resource by using an Alibaba Cloud account, MaxCompute checks whether the account is the resource owner. Only the resource owner has permissions on the resource.

  • If you access a resource by using a RAM user, MaxCompute checks whether the Alibaba Cloud account of the RAM user is the resource owner and whether the RAM user is granted permissions on the resource.

Note

If an Alibaba Cloud account other than the resource owner and the RAM users of the Alibaba Cloud account are granted permissions on the resource, the Alibaba Cloud account and the RAM users can also access the resource.

MaxCompute supports the following access control mechanisms.

  • ACL-based access control: an object-based authorization mechanism. An ACL specifies permissions on an object and is considered as a subresource of the object. An ACL takes effect only if the object exists. If the object is deleted, the ACL of the object is also deleted. ACL-based access control is similar to the authorization mechanism that is implemented by using the GRANT and REVOKE statements defined in SQL-92. You can execute these statements to grant or revoke permissions on an object. To manage permissions, specify the effect (grant or revoke), object (such as a table or resource), subject (user or role), and action (such as read, write, or delete). For example, you can grant the read permissions on table1 to user zinan.tang.

  • RAM-based access control: MaxCompute allows you to attach RAM policies to RAM users and RAM roles. To reduce security risks for enterprises, you can grant the permissions to access and manage MaxCompute resources within your Alibaba Cloud account to the RAM users and RAM roles based on the principle of least privilege.

MaxCompute also supports other access control mechanisms in the following scenarios:

  • Cross-project resource sharing

    You are the owner or administrator that has the admin role of a project, and another user wants to access resources in your project. If the user belongs to your project team, we recommend that you grant permissions to the user by using the authorization management feature. If the user does not belong to your project team, you can share resources with the user across projects by using packages.

    Packages are used for data and resource sharing across projects. You can use a package to implement cross-project user authorization.

    The administrator of Project A packages all objects that are required by the user in Project B and grants the administrator of Project B the permissions to install the package. Then, the administrator of Project B installs the package and determines whether to grant other users in Project B the permissions on the package.

    The following section describes how to create and use a package:

    • For a package creator

      -- Create a package. 
      create package <pkgname>;
      ** Note:
      -- Only a project owner is authorized to perform this operation. 
      -- The package name must be 1 to 128 characters in length. 
      
      -- Add objects to the package. 
      add project_object to package package_name [with privileges privileges];
      remove project_object from package package_name;
      project_object ::= table table_name |
      instance inst_name |
      function func_name |
      resource res_name
      privileges ::= action_item1, action_item2, ...
      ** Note:
      -- You cannot add the project as an object to a package.  
      -- Specify the permissions on the objects in the [withprivileges privileges] part. If you do not specify the permissions by using [withprivileges privileges], the object is read-only. In this case, only the Describe and Select permissions on the object can be granted. An object and the permissions on the object are inseparable and cannot be changed after you add the object and the permissions to a package. If you want to change the object and the permissions, delete the object and then add the object again. 
      -- Grant the permissions on the package to another project. 
      allow project <prjname> to install package <pkgname> [using label <number>];
      -- Revoke the permissions on the package from another project. 
      disallow project <prjname> to install package <pkgname>;
      
      -- Delete a package. 
      delete package <pkgname>;
      
      -- View the list of packages. 
      show packages;
      
      -- View the details of a package. 
      describe package <pkgname>;
    • For a user of a package

      -- Install the package. 
      install package <pkgname>; 
      -- Note:
      -- Only a project owner is authorized to perform this operation. 
      -- pkgName specifies the name of the package that you want to install and must be in the <projectName>.<packageName> format. 
      -- Uninstall a package. 
      uninstall package <pkgname>; 
      -- pkgName specifies the name of the package that you want to uninstall and must be in the <projectName>.<packageName> format. 
      -- View the list of created and installed packages. 
      show packages;
      -- View the details of a package. 
      describe package <pkgname>;

      An installed package is an independent object in MaxCompute. To access resources in a package that is shared by the user of another project, you must have the Read permission on the package. If you do not have the Read permission on the package, you can apply to the project owner or the administrator for the permission. The project owner or the administrator grants the permission by using ACLs.

      For example, the following ACL rules allow the odps_test@aliyun.com account to access a package. Sample code:

      use prj2;
      install package prj1.testpkg;
      grant read on package prj1.testpackage to user
      aliyun$odps_test@aliyun.com;
  • Column-level access control

    LabelSecurity enables fine-grained mandatory access control (MAC) for a project. It allows the project administrator to control user access to column data that has different levels of sensitivity.

    LabelSecurity classifies both data and users who need to access the data into different levels. The data is classified into the following levels based on sensitivity:

    • Level 0: unclassified.

    • Level 1: confidential.

    • Level 2: sensitive.

    • Level 3: highly sensitive.

    MaxCompute adopts the preceding data levels. Project owners must specify personal standards to determine the sensitivity levels of data and the access permission levels of users. By default, the sensitivity level of all data is 0 and the access permission level of all users is 0.

    LabelSecurity allows project owners to label table columns and views with different sensitivity levels.

    By default, the sensitivity level of a new view is 0. The sensitivity levels of views and base tables are independent of each other.

    LabelSecurity applies the following default security policies based on the levels of data and users:

    • No-ReadUp: Users cannot read data that has a higher sensitivity level than their own, unless the users are explicitly authorized.

    • Trusted-User: Users are allowed to write data to columns regardless of the sensitivity levels. The default sensitivity level of a new column is 0.

    Note
    • Traditional MAC systems use sophisticated security policies to prevent unauthorized data operations in projects. For example, the No-WriteDown policy only allows a user to write data to columns that have a higher sensitivity level than the level of the user. By default, MaxCompute does not support the No-WriteDown policy, which increases the workload of managing data sensitivity levels by project administrators. The project administrators can specify the SetObjectCreatorHasGrantPermission=false configuration to implement a policy that is similar to the No-WriteDown policy.

    • If you want to prevent data transfer across projects, you can enable project protection. After the setting takes effect, users can only access data within the projects of the users and cannot share the data with other projects.

    By default, LabelSecurity is disabled. The project owner can enable LabelSecurity based on business requirements. After LabelSecurity is enabled, the preceding default security policies take effect. Users must have the Select permission and the required level to access sensitive data in the tables.

  • Project protection

    Users who are authorized to access data in multiple projects can transfer data across these projects. If a project contains highly sensitive data that cannot be shared with other projects, the project owner can set projectProtection to true.

    Sample code:

    set projectProtection=true;
    -- This command allows data only to be written into the project but not to be read across projects. 
    -- By default, the projectProtection parameter is set to false. You need to manually enable project protection by setting this parameter to true.
  • Data transfer across projects after project protection is enabled

    If project protection is enabled for your project but you want to transfer data to another project, specify the project to which you want to transfer data as a trusted project. This way, data transfer to the project is allowed. If you specify multiple projects as trusted projects for each other, the projects form a trusted project group. You can export data of a project to other projects in the group but you cannot export data to projects that do not belong to the group.

    Commands to manage trusted projects:

    list trustedprojects;
    -- List all trusted projects that are added to the current project.  
    add trustedproject <projectname>;
    -- Add a trusted project to the current project.  
    remove trustedproject <projectname>;
    -- Remove a trusted project from the current project.
  • Resource sharing and data protection

    MaxCompute supports package-based resource sharing and project protection. However, the features are mutually exclusive.

    In MaxCompute, package-based resource sharing takes precedence over project data protection. If you share a data object with users in another project by using the package feature, the data object is not limited by project protection.

    To prevent data outflow from the project, specify ProjectProtection=true to enable project protection and check the following points:

    • No trusted projects are added. If a trusted project is added, evaluate the potential risks.

    • No exception policies are configured. If an exception policy is configured, evaluate the potential risks, especially data leaks caused by time-of-check to time-of-use (TOC2TOU).

    • No data is shared by using packages. If packages are used to share data, make sure that the packages do not contain sensitive data.

Auditing

MaxCompute allows you to audit logs that are generated for different users and stores the logs in metadata warehouses.

The metadata includes static data, operation logs, and security information. You can query the metadata and analyze the running status of MaxCompute.

  • Static data is permanently written to the data warehouse.

  • Operation logs record task running processes and are stored in only one partition.

  • Security information originates from Tablestore and includes whitelists and ACLs.

MaxCompute records all user behavior, and pushes user behavior logs to ActionTrail in real time by using the Alibaba Cloud ActionTrail service. In the ActionTrail console, you can view and retrieve user behavior logs and deliver the logs to your Simple Log Service project or a specified Object Storage Service (OSS) bucket. This way, you can perform real-time log auditing and problem backtracking.

System security

MaxCompute can ensure system security in multi-tenant scenarios. MaxCompute integrates the authentication system of Alibaba Cloud to verify the signature in each HTTP request by using AccessKey pairs. MaxCompute performs complete permission checks on user operations, and stores and isolates data of different users in Apsara Distributed File System. Computing resources are shared. This allows MaxCompute to meet the requirements for multi-tenant collaboration, data sharing, data confidentiality and security, and resource scalability, and helps achieve serverless multi-tenant resource isolation.

To ensure flexibility and scalability, MaxCompute supports SQL user-defined functions (UDFs) for computing behavior extension. MaxCompute also introduces third-party engines, such as Spark. Untrusted code of these features may trigger unexpected system damage, or the system may be attacked by malicious users. MaxCompute uses lightweight security containers (virtualized containers) and language-level sandboxes to implement process-level isolation. Untrusted code runs in security containers to achieve high-level security isolation. The agent that provides host security protection can also detect malicious behaviors on the system in real time and control the damage at the earliest opportunity.

Network security

Access control

MaxCompute supports accesses from endpoints over the classic network, the Internet, and a virtual private cloud (VPC). You can configure an IP address whitelist for a project for access control. Take note of the following items:

  • The classic network, a VPC, and the Internet are isolated from each other. Users can only access the endpoints and IP addresses of their own networks.

  • Projects for which VPC IDs or IP address whitelists are not configured can be accessed over the three types of networks by using valid endpoints. An endpoint is valid if its access request is verified.

  • Projects for which VPC IDs are configured can be accessed only over the specified VPCs.

  • Projects for which IP address whitelists are configured can be accessed by hosts whose IP addresses are added to the IP address whitelists.

  • If a request is sent by a proxy, the request is allowed or denied based on the VPC ID or the last-hop IP address.

You can determine the IP addresses that must be added to IP address whitelists by using one of the following methods:

  • If you access project data in the MaxCompute console, obtain the IP address of the MaxCompute console.

  • If you use an application system, such as DataWorks or Data Integration, to access project data, obtain the IP addresses of the servers on which DataWorks or Data Integration is deployed. The IP addresses of the default servers are automatically added to the whitelist.

  • If you use a proxy server to access MaxCompute instances, obtain the IP address of the last-hop proxy server.

  • If you access MaxCompute instances from ECS instances, obtain the NAT IP addresses.

Separate multiple IP addresses with commas (,). You can configure the IP addresses of the following types:

  • Individual IP addresses.

  • An IP address range. Separate the start IP address and the end IP address with a hyphen (-).

  • An IP address that has a subnet mask.

-- Individual IP addresses 
10.32.180.8,10.32.180.9,10.32.180.10
-- An IP address range   
10.32.180.8-10.32.180.12
-- An IP address that has a subnet mask 
10.32.180.0/23

The following section describes how to configure an IP address whitelist for a project.

Run the following command on the client as the project owner to add an IP address whitelist:

setproject odps.security.ip.whitelist=101.132.236.134,100.116.0.0/16,101.132.236.134-101.132.236.144;
Note
  • Only IP addresses in the whitelist, such as the outbound IP addresses of the MaxCompute console or SDK, can access the project.

  • An IP address whitelist takes effect 5 minutes after it is configured.

Run the following command to disable the IP address whitelist:

setproject odps.security.ip.whitelist=;

Impact

  • Before you add an IP address whitelist, MaxCompute does not restrict access to the project.

  • After you add the IP address whitelist, only IP addresses and IP address ranges in the whitelist can access the project. You can achieve fine-grained access control by using the IP address whitelist together with the authentication mechanism based on the AccessKey ID and the AccessKey secret.

Network isolation

Jobs that run in security containers may need to communicate with each other. To ensure security, these communications cannot be built on the host network. In this case, MaxCompute constructs a virtual network for secure containers by using the overlay feature. All nodes that are involved in a job run in the same virtual network. The nodes communicate with each other by using private IP addresses but cannot access the host network. However, users may need to access external networks, such as accessing an API over the Internet or other data services in the VPC. In this case, MaxCompute allows job-level network connection. After a user declares the desired network when a job starts and passes the required permission checks, the user can access the desired network at the job level.

This helps MaxCompute implement security isolation for different tenants on a resource pool and allows for more possible business forms. Based on secure containers and virtual network isolation, MaxCompute implements powerful UDFs on a multi-tenant cluster. Compared with other platforms, MaxCompute imposes fewer restrictions on UDF capabilities. You can use UDFs in MaxCompute to access the on-premises I/O data, on-premises network, and data in the VPC. If you use the data lakehouse solution, you can create a network connection to the VPC. When you create an external data source, you can use SQL statements in MaxCompute to access external data. Job-level isolation enables MaxCompute to provide hybrid computing modes in a cluster. In addition to implementing SQL statements and UDFs, MaxCompute also supports Alibaba Cloud Machine Learning Platform for AI (PAI) and the open source Spark engine.

Data security

MaxCompute passed an independent third-party audit in compliance with the trust services criteria for security, availability, and confidentiality of the American Institute of Certified Public Accountants (AICPA). For more information about the audit report, see SOC 3 Report.

Apsara Stack Resilience for Backup and Recovery

Alibaba Cloud provides a flat storage system in which linear address space is split into chunks. Each chunk is replicated to create three replicas, which are stored on different data nodes of the cluster to ensure data reliability.

The following three key components are in a data storage system: master, chunk server, and client. Write operations in MaxCompute are processed and executed by the client in the following process:

  1. The client determines the location of the chunk that is requested by the write operation.

  2. The client sends a request to the master to query the chunk servers on which the three chunk replicas are stored.

  3. The master returns the addresses of the chunk servers. Then, the client sends the write request to the chunk servers.

  4. If the write operation is successful in all three chunk replicas, the client returns a success message. Otherwise, the client returns a failure message.

The master considers the disk usage of all chunk servers in the cluster, distribution of chunk servers in different switch racks, power supply status, and machine load to ensure that the three replicas are distributed to different chunk servers in different racks. This effectively prevents a single point of failure due to the failure of a chunk server or rack.

If a data node or the hard disks of the node are faulty, the total number of valid replicas of some chunks may become less than three. In this case, the master replicates data between chunk servers to make sure that each chunk in the cluster has three valid replicas.

All data operations in MaxCompute, such as addition, modification, and deletion, are synchronized to the three replicas. This mechanism ensures data reliability and consistency.

After you delete data, the released storage space is reclaimed by Apsara Distributed File System and cannot be accessed by all users. Apsara Distributed File System clears data from the storage space. This provides maximum protection for your data.

Data encryption

MaxCompute allows you to use Key Management Service (KMS) to encrypt data for storage. MaxCompute provides static data protection to meet the requirements of enterprise governance and security compliance. MaxCompute allows you to use the MaxCompute default key and Bring Your Own Keys (BYOKs) to encrypt or decrypt data. MaxCompute SQL provides encryption and decryption functions that you can use to encrypt or decrypt data in specified columns.

Encrypted data transmission

When encrypted data is used for computing, the encrypted data is transferred in ciphertext and decrypted at the computing terminal. If an external product or client reads data from or writes data to MaxCompute, HTTPS is used to ensure the security of data transfer.