Community Blog Real-World Implementation of Data Analytics with Alibaba Cloud: MaxCompute and Data Warehousing (Part 4)

Real-World Implementation of Data Analytics with Alibaba Cloud: MaxCompute and Data Warehousing (Part 4)

Part 4 of this article series discusses MaxCompute and big data warehousing solutions.

By Shantanu Kaushik

Data warehousing as a concept and architecture goes back to the 1980s. It was initially developed to manipulate or transform data for better decision-making to add to the overall business value. Data warehouses are used to store data from different sources for data processing, analytics, and other types of research or consolidation.

Alibaba Cloud introduced MaxCompute as the computing platform for large-scale data warehousing. It is a fully-managed service that takes away the need for any O&M associated with the service.

What Is Data Warehousing?

Warehouses are structures to store objects from different sources. This could be a short-term or long-term storage system depending on supply and demand. Data warehousing is all about storing data from different sources. This data is processed to be stored in a singular format based on what the data warehouse supports. Data warehousing is among the most essential components of business intelligence, which constitutes the total operational system of data analytics. Data warehousing takes care of the extraction and transformation of data to be sent for analysis.

Historical Data

Data warehouses store historical data that is different from general or operational data used for daily operations. Historical data is among the most useful data that is collected and stored from different sources. It represents functionality and operations analysis spanning a large packet of time.

Data Warehousing Challenges

Data warehouses are big storage giants for data. Designing and maintaining a data warehouse takes specific strategies, resources, and collaboration. One of the major factors that affect the operations of a data warehouse is your cloud partner and a solution that can be customized to suit your needs. Alibaba Cloud has a number of solutions to work with data analytics and the cloud. MaxCompute takes care of all the challenges associated with data warehousing.

Let’s take a look at some of these challenges:

  • Quality of Data

Data that comes from various sources will reflect inconsistencies. If these inconsistencies are too much, it will directly affect the data quality. Within a data warehouse, you need to maintain certain data quality to maintain a perfect or near-perfect raw data stream for data analytics to add to the overall business intelligence scenario.

  • Data Testing

Critical decisions are based on data that ensure future success. Data testing is a big challenge since the datasets are large and require a strong infrastructure.

  • Data Design and Performance

A data warehouse has to be designed and managed according to the business demands and applied strategy. The solutions need to be customized and in-sync with business demands to gather good performance from your data warehouse and enable higher value extraction from your data for data analytics.

  • Privacy

Data Mining has been increasing ten-folds every year. An increase in power and demand has made this area susceptible to privacy concerns. Multiple sources for data coming from independent contributors has to pass through the layers of security and privacy concern to make sure it doesn’t affect the overall business productivity and analytics.

MaxCompute | The Alibaba Cloud Solution

Alibaba Cloud created MaxCompute to enable large-scale data warehousing needs. MaxCompute provides a stable, secure, high-performing, and scalable computing engine.

The traditional software industry became obsolete with the ever-increasing size of data. Alibaba Cloud created MaxCompute to overcome this and all of the data warehousing challenges. It was designed to provide computing power to process and store large amounts of structured data and enable precise data analytics and data modeling solutions.

Let’s take a look at the workflow of Alibaba Cloud MaxCompute on the chart below:


MaxCompute supports a distributed computing model that enables easier handling of large datasets compared to an overloaded single server. Since it is a fully-managed service, users don’t need to know complex backend concepts or how to manage it. MaxCompute takes care of all that with any O&M associated with it and provides a comprehensive and seamless experience with data warehousing and analytics.

Benefits | Features | MaxCompute

MaxCompute provides seamless integration with DataWorks. With this integration, data synchronization, workflow design, data development, and management are shared across the Alibaba Cloud analytics platform. MaxCompute provides many computing models and supports APIs to fulfill any data analytics demands.

  • Computing Models

MaxCompute supports various computing models:

  1. E-MapReduce
  2. Graph
  3. Directed Acyclic Graph (DAG) – Python and Java
  4. User-Defined Functions (UDF)
  5. SQL
  6. Message Passing Interface (MPI) Iterative Algorithms
  7. Machine Learning
  8. Interactive Analytics
  9. DataV
  10. In-Memory Computing
  • Large-Scale Computing

MaxCompute can provide computing for exabytes of data with anything more than 100GB.

  • Security

Alibaba Cloud has implemented MaxCompute in their data warehouses for over a decade. MaxCompute provides a multi-tier and multi-layer sandboxing service along with permission management and monitoring.

  • Scalability and Cost

Alibaba Cloud MaxCompute provides an elastic and scalable service that can provide job-level resource management. According to the presented need, MaxCompute can automatically increase or decrease resources, such as ECS, OSS, and network systems. This elasticity of operations and no added O&M reduce the associative operation costs drastically.

  • Data Tunneling

MaxCompute enables high concurrency data uploads and downloads using the MaxCompute data tunnel transmission service. This data transmission service supports terabytes of import and export data daily.

Data Tunnel is most useful when performing batch imports or when using historical data (history data tunnel.) You can use the tunneling functionality with a Java API and control the system with a MaxCompute client.

To upload real-time data, MaxCompute uses DataHub, which features low latency operations. With DataHub, you can enable incremental data imports to maintain a sync between the data center and the cloud.

Wrapping Up

Alibaba Cloud Data Analytics solutions are well integrated. MaxCompute and DataWorks both feature deep integration to provide data warehousing and big data functionality. DataV is used to visualize data for the businesses to understand the presented data easily and efficiently.

Alibaba Cloud MaxCompute is among the best data warehousing solutions available. It caters to more than 16 countries and can provide a wide variety of situational data warehousing. In the next article of this series, we will talk about DataV and how it adds to the overall business value.

Upcoming Articles

  1. Real-World Implementation of Data Analytics With Alibaba Cloud (Part 5): Using DataV to Maximize Productivity
0 0 0
Share on

Alibaba Clouder

2,606 posts | 737 followers

You may also like


Alibaba Clouder

2,606 posts | 737 followers

Related Products