Join us at the Alibaba Cloud ACtivate Online Conference on March 5-6 to challenge assumptions, exchange ideas, and explore what is possible through digital transformation.
As more and more businesses adopt big data technologies in their infrastructure, data security has now become a crucial component in ensuring business success. With the increasing demands for data storage size as well as the growth of data sources, extra precaution must be taken to ensure data privacy and security. Tools such as MaxCompute and DataWorks can help organizations significantly reduce the O&M workload for data management while keeping them secure.
Regardless, data security is not something that should be taken lightly. For instance, MaxCompute and DataWorks have different security models. Because MaxCompute and DataWorks are typically used together, you need to fully understand how both systems interact with each other. When MaxCompute is used through DataWorks, and the security model of DataWorks does not meet the business security requirements, it is especially important to reasonably combine the two security models.
In this guide, we will share the basics of security management for MaxCompute and DataWorks. This guide is aimed at facilitating and assisting the project owner or security administrator of MaxCompute in the daily security operations of projects to ensure data security.
The MaxCompute multi-tenant data security system includes:
User Authentication
Two account systems, cloud accounts and RAM accounts, are supported. For RAM accounts, any RAM user of the primary account can be added to a certain project of MaxCompute only by identifying the account system but not the RAM permission system. However, MaxCompute does not consider the permission definition in RAM when verifying the permission of the RAM user.
User and Authorization Management
In a MaxCompute project, you can add users, remove users, or grant permissions to users.
You can also manage authorization through a role, and a MaxCompute project has admin role by default.
And, the authorization methods include ACL and Policy. This article only covers the ACL method, and Policy method is described in the subsequent Upgrade.
ACL is similar to the syntax of the GRANT and REVOKE commands defined by SQL92. It grants or revokes permissions to/from the existing project object through simple authorization statements. The authorization syntax is as follows:
grant actions on object to subject
revoke actions on object from subject
Label Security Policy
Label-based security (LabelSecurity) is a mandatory access control (MAC) policy at the project level. It allows project administrators to control the user access to column-level sensitive data with improved flexibility.
Sharing of Resources across Projects
A package is a mechanism to share data and resources across projects. It is used to solve cross-project user authorization problems. That is, resources, such as tables, resources and functions, can be shared with other projects, but users of other projects need not be managed.
Data Protection of Projects
It mainly meets the requirement that "users are not allowed to transfer data outside the project".
The MaxCompute system includes various policies, as described in the last section, and the authorization of these policies is in a relationship of increasing permission. It is necessary to obtain the permission of a L4 table to explain this relationship in detail. The main steps are as follows:
Step 1: If the user does not have an authorization record and is not a user of this project, a user needs to be added first. In this process, the user does not have any actual permissions.
Step 2: Grant the user the action permission of the object in the following ways.
Step 3: For a resource with a label, such as a data table or a package with data tables, the permission to the label must also be given. There are four types of label authorizations:
The diagram of the process and relationship of each authorization is as follows:
ProjectProtection (Data Flow Protection Mechanism) is a security function for MaxCompute to prevent data in the project from flowing out in batches. After ProjectProtection is enabled, the data authorization with other projects must be carried out through a package if no "TrustedProject Group" is created between them. After being authorized by a package, packages of other projects can independently authorize the resources within the package to users in the group.
For some resources (such as some common tables, and UDFs), if you want to manage them with a package, you can also package the resources and delegate to other projects.
ProjectProtection supports Exception policies. For some special business scenarios, an Exception policy can be applied to IP addresses and product cloud accounts to meet special data outflow requirements.
DataWorks provides a platform for multiple people to cooperate in data development. For its security model, several aspects need to be considered:
For the user authentication of DataWorks and DataWorks connecting to RAM, the cloud account can be opened as the primary account and create DataWorks projects, while the project member must be the RAM users of the primary account, not other cloud accounts.
In addition, the projects created by the same primary account is taken as a group, and the tasks between these projects can be configured with dependencies. However, the data (various tasks) between projects created by different primary accounts is isolated.
There are security issues in data development (that is, during the ETL process). For example, how to ensure that production tasks cannot be arbitrarily changed, which member can edit and debug code, and which member can publish production tasks.
DataWorks divides "development projects" and "production projects" by the business to carry out task development debugging and stable production isolation. Using member roles, you can control which member can perform task development debugging, and which member can operate and maintain production tasks.
The underlying MaxCompute has its own security model, so project members definitely need the related permissions of various resources (tables, resources, functions, and instances) of MaxCompute during the ETL process.
When a MaxCompute project is successfully created, DataWorks creates roles corresponding to DataWorks in the MaxCompute project and grants permissions to different roles.
From the two sections of MaxCompute and DataWorks introduced earlier, you can see that permission control through the security model of MaxCompute does not affect how members operate in any interface in DataWorks. By assigning the user role of DataWorks, it is possible to affect the MaxCompute resource permissions of members. The following describes in detail how the permissions between the two products are cross-linked.
If you create a project through the console accessed through the Product page of the MaxCompute or DataWorks website,
DataWorks provides two options:
Cloud accounts can only be the primary account in a DataWorks project, which is the project owner, and can be either owner or ordinary user in MaxCompute. When adding members through DataWorks project member management, only RAM users corresponding to the primary account of the current project can be added. In MaxCompute, you can add other cloud accounts by using the add user xxx;
command.
As mentioned in the previous section (DataWorks security model), DataWorks has bound some MaxCompute roles to solve the problem that project members need MaxCompute related resource permissions during the ETL process. Specifically, the DataWorks project has a fixed number of member roles, while corresponding roles are created on the corresponding MaxCompute project. And, the MaxCompute project also has an admin role in addition to the project owner.
The specific permissions are as follows:
From the table above, you can see that the MaxCompute permission corresponding to the DataWorks role is fixed. Once a user obtains MaxCompute related role permission through a DataWorks role and obtains other MaxCompute permissions through a command line, the permission of the user in MaxCompute is inconsistent with that on DataWorks.
A DataWorks project is bound with a MaxCompute project. At this time, the Project Management of DataWorks (MaxCompute Access Identity in the MaxCompute settings) determines whether other DataWorks project members have the permissions for the MaxCompute project.
In standard mode, a DataWorks project is bound with two MaxCompute projects. At this time, one of the two projects is a development project and the other is a production project, which is fixed. Other DataWorks project members have the role permission corresponding to the development project according to their roles, but they do not have the permission to the production project. For MaxCompute tasks, you need to go through the publishing process to publish these tasks to the production project, and then submit them to MaxCompute for execution under the owner account.
Accessing MaxCompute Lightning with Java and Python for App Development
MaxCompute and DataWorks Security Management Guide: Basics (2)
137 posts | 19 followers
FollowAlibaba Cloud MaxCompute - February 15, 2019
Alibaba Cloud MaxCompute - February 19, 2019
Alibaba Clouder - July 10, 2020
Alibaba Cloud MaxCompute - March 4, 2019
Alibaba Clouder - September 26, 2019
Alibaba Cloud Data Intelligence - July 25, 2023
137 posts | 19 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreThis solution helps you easily build a robust data security framework to safeguard your data assets throughout the data security lifecycle with ensured confidentiality, integrity, and availability of your data.
Learn MoreMore Posts by Alibaba Cloud MaxCompute