As an enterprise-class cloud data warehousing solution adopting an SaaS-based model, Alibaba Cloud MaxCompute ensures continuous business and data security for its customers. MaxCompute recently upgraded its comprehensive security capabilities. This article describes the best practices based on the native and integrated security capabilities of MaxCompute and DataWorks in typical data risk scenarios, such as data misuse, abuse, breaches, and loss throughout the data lifecycle.
Alibaba Cloud MaxCompute is a cloud-native, high-performance enterprise-level data warehousing service based on the Software-as-a-Service (SaaS) model. It is widely used to build modern enterprise data platforms for business intelligence (BI) analysis, data-driven operations, profiling and recommendation, intelligent prediction, and other scenarios.
MaxCompute draws strength from Alibaba Cloud's large-scale computing and storage resources and provides a fully managed online data warehousing service through a serverless architecture. It breaks the limitations on resource scalability and elasticity, which are common on traditional data platforms, and minimizes investment in operations and maintenance (O&M)
MaxCompute supports a wide range of classic computing models, such as batch processing, machine learning, and interactive analytics, and offers comprehensive enterprise management functionality. MaxCompute allows you to easily integrate and manage enterprise data assets and streamlines the data platform architecture for faster mining of the value of data.
MaxCompute recently upgraded its comprehensive security capabilities. Newly released security capabilities include:
An enterprise-level big data platform is exposed to three levels of security risks, as shown in Figure 1.
Figure 1: Security system of a big data platform
1) Infrastructure security and platform trustworthiness ensure the physical safety and network security of data centers. Countermeasures against risks at this level primarily include enhancing data center security facilities, data center security management, and data center network security.
2) System security of big data platforms. Countermeasures against risk at this level primarily include building subsystems such as access control, security isolation, risk control and audit, and data protection subsystems, and providing underlying platform-based capabilities for upper-level security applications or tools.
3) Security of data applications. Countermeasures against risks at this level primarily include providing users with tool-based data security products, optimizing the user experience, and helping users better cope with data risks.
The recent upgrade of MaxCompute's security capabilities has introduced new features to access control, risk control and auditing, data protection, and other subsystems, as highlighted in yellow in the "Big data platform security" layer in Figure 1. In this article, we will introduce the best practices for major types of data risks, as shown in Figure 2. We will explain when, why, and how to use these new features in these best practices.
Figure 2: Major types of data risks
Data misuse is caused by unintended or negligent actions. Preventing data misuse usually refers to preventing data from being inadvertently and incorrectly used. A core requirement for responding to data misuse risks and preventing data misuse is to understand data. You can say you understand your data when you know what data you have, where it is stored, and how it is collected and used.
MaxCompute can help you properly answer these questions. The MaxCompute platform adopts unified metadata management to provide metadata and related logs based on uniform metadata and complete platform logs. You can build your own data management applications based on the Information Schema of MaxCompute.
Most users prefer to learn more about their data through existing data management applications or services. Here, DataWorks comes in handy with its Data Map module. Its data overview, data details, and other information can help you gain a picture of your data and its details. Its output and usage as well as lineage information can help you better understand your data and ultimately contribute to the correct and proper use of the data. This ensures that the right data is correctly used in the right scenario.
Figure 3: Understand data using a data map
Data abuse refers to using data in scenarios or for purposes beyond its intended scope. Data abuse is usually caused by intentional or purposeful actions. A major countermeasure against data abuse is the least privilege principle of data use, which strictly limits the scope of data access and use. The four major processes in Figure 5 are recommended as best practices in permission management.
MaxCompute's fine-grained permission system, if used with DataWorks or other GUI-based tools, can implement the best practice of the least privilege to reduce data abuse risks.
MaxCompute supports different authorization mechanisms for users or roles. The following mechanisms are examples.
Regardless of the access control mechanism you choose, three elements remain the same during the authorization and authentication: action, object, and subject, as shown in the following figure.
The new security capabilities that MaxCompute released in this upgrade also include an upgrade to the permission model to support finer-grained authorization and authentication for refined permission management. Main new features include:
Figure 4: MaxCompute's fine-grained permission system
Fine-grained permission management capabilities in this release are highlighted in orange.
MaxCompute's fine-grained permission system enables least privilege authorization on the platform. When MaxCompute is used in concert with GUI-based tools, such as DataWorks-Security Center, it can provide a better user experience and more convenient permission management.
Figure 5: GUI-based permission management with Security Center
The Security Center provides convenient permission management and visualized request and approval processes, in addition to permission auditing and management capabilities.
Figure 6: Data lifecycle
Data breaches may occur in multiple stages of the data lifecycle, such as data transmission, storage, processing, and exchange. Therefore, we introduce the best practices to defend against data breaches at different stages of the data lifecycle.
First, data is collected from different channels and transferred to the big data platform through various channels. On the big data platform, data may be calculated and then written to disks for storage, be transferred between different tenants and services following a data sharing mechanism, or, after a certain period of time, be deleted and destroyed. Processed data is consumed by other data applications or users through different channels. (See Figure 7.)
Figure 7: Data lifecycle on a big data platform
First, let's look at how to cope with data breach risks during storage, such as direct access to the data stored on disks and access to data disks. One countermeasure is to encrypt the data stored on disks. This can prevent the data from being read or used even when it is improperly accessed.
In this upgrade, a storage encryption feature was released for MaxCompute to support encrypting data disks.
The security isolation capability of a big data platform plays a critical role in coping with the risks of data breach during data processing.
MaxCompute creates an independent, isolated environment for executing data processing applications and supports all user-defined function (UDF) types, Java and Python UDFs, and open-source third-party computing engines such as Spark, Flink, and Tensorflow, enabling diversified data processing capabilities.
Figure 8: MaxCompute's security isolation capabilities
Sound data isolation and permission management systems are essential to data security because they can prevent data breaches during data exchanges or sharing. MaxCompute supports data isolation and permission management for a range of levels and dimensions, providing multi-level data protection and data sharing.
Figure 9: MaxCompute's data isolation capabilities
An important part of coping with data breach risks is the protection of sensitive data. The responses to risks in the data storage, processing, and exchange processes described in preceding sections are also applicable to sensitive data protection. In addition, the following best practices target sensitive data protection scenarios.
Figure 10: Protection of sensitive data
Data Security Guard is a sensitive data protection tool that is built to match the data classification and grading capabilities of the MaxCompute platform and integrates data masking capabilities. It allows users to label data as sensitive and select masking algorithms to mask the sensitive data in data outputs.
For more information about the service and its usage, see the Data Security Guard documentation.
Figure 11: Sensitive data protection tool - Data Security Guard
Apart from malicious data breaches, data abuse, and other risks, improper operations during data development, occasional faults with equipment or data centers, and rare and unexpected disasters can all lead to data loss. The main best practices to prevent data loss risks include backup and recovery and disaster recovery.
Data recovery may be inevitable during data development either due to improper operations such as unintended data deletions by using DROP or TRUNCATE TABLE statements, or problematic data after the INSERT INTO or INSERT OVERWRITE syntax is executed.
MaxCompute recently released continuous backup and recovery capabilities. The system automatically backs up and retains the data before a deletion or modification action is performed for a specific period of time. Within this period of time, you can recover the data quickly to prevent data loss due to incorrect operations.
Figure 12: MaxCompute's continuous backup and recovery capabilities
With its geo-disaster recovery capabilities, MaxCompute provides better data security in extreme scenarios such as data center failures or unexpected disasters.
After you specify a backup location for the backup cluster of a MaxCompute project, MaxCompute can automatically implement data replication between the primary and backup clusters to ensure data consistency and achieve geo-disaster recovery. If a fault occurs, the MaxCompute project switches from the primary cluster to the backup cluster and uses the computing resources of the backup cluster to access the data in the backup cluster. In this way, the service is resumed and switched to the backup cluster.
Figure 13: MaxCompute's geo-disaster recovery
So far, we have introduced practices to defend against various data risks during data development and use. Now, we will look at a very important practice which is applicable to all kinds of data risks in the last section.
MaxCompute provides comprehensive historical data and real-time logs.
You can build your own data risk control and audit systems based on Information Schema and real-time audit logs. Information Schema was released last year. Below, we will introduce the real-time audit log which is a new feature.
Not all users plan to build their own risk control and audit tools. Instead, they can use risk control and audit services in DataWorks for this purpose. With out-of-the-box services, there is no need to expend effort on secondary development, though customers enjoy a lower degree of customization.
Is sensitive data overused? Are too many data access permissions granted? Is there an abnormality such as unplanned frequent data access? Administrators are often asked these questions about data security. MaxCompute's audit log feature can help you answer these questions.
MaxCompute keeps a full record of users' actions and pushes user behavior logs to Alibaba Cloud's ActionTrail service. You can view and retrieve user behavior logs in ActionTrail and deliver the logs to a Log Service project or a specified Object Storage Service (OSS) bucket for the purposes of real-time auditing and event traceability and analysis.
ActionTrail supports auditing user behavior for instances, tables, functions, resources, users, roles, and privileges. For more information about this feature and its usage, see the Audit Log documentation.
Figure 14: MaxCompute's audit log
You can use existing services provided by DataWorks for data security risk control and auditing.
Figure 15: Risk control and auditing with Data Security Guard
Echoing the introduction, this summary offers an overview of the three levels of data security systems of an enterprise-level big data platform. Here, we reorganize the security capabilities of MaxCompute according to the six stages in a data lifecycle, as shown in Figure 16. This helps us better understand the applicable data security practices at each stage of the data lifecycle. New features released in this upgrade are highlighted in yellow in Figure 16.
Figure 16: Lifecycle-stage-specific data security practices on a big data platform
As a cloud data warehouse based on the SaaS model, MaxCompute boasts leading security capabilities and has passed multiple international, European, and Chinese security compliance certifications, including the internationally recognized ISO certification, SOC 1, 2, and 3 (SOC is short for System and Organization Control), Payment Card Industry Data Security Standard (PCI DSS), the C5 certification used in Europe, and Cybersecurity Multi-Level Protection Scheme 2.0 which is dominant in China. For more information about Alibaba Cloud's security compliance certification system, see the Alibaba Cloud Trust Center - Certification of Compliance page. We welcome you to use MaxCompute to ensure enterprise-level big data security.
To learn more about Alibaba Cloud MaxCompute, visit https://www.alibabacloud.com/product/maxcompute
Alibaba Cloud MaxCompute - September 18, 2019
Alibaba Clouder - March 29, 2021
Alibaba Cloud MaxCompute - March 24, 2021
Alibaba Cloud MaxCompute - March 25, 2021
Alibaba Cloud Community - March 29, 2022
Alibaba Cloud MaxCompute - September 23, 2019
Deploy custom Alibaba Cloud solutions for business-critical scenarios with Quick Start templates.Learn More
Alibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.Learn More
Alibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.Learn More
This solution helps you easily build a robust data security framework to safeguard your data assets throughout the data security lifecycle with ensured confidentiality, integrity, and availability of your data.Learn More
More Posts by Alibaba Cloud New Products