This topic describes the technology development trends and market trends of data analysis.

Technology development trends

Commercial databases began to emerge on the market in the 1980s. Popular examples of commercial databases include Oracle, SQL Server, and Db2, which are all relational databases used to process structured data in real time. Open source relational databases such as MySQL and PostgreSQL also witnessed rapid development in the 1990s.

In recent years, as the volume of business data continues to grow, enterprises need to be able to analyze the data to make more informed business decisions and leverage the full value of the data. However, traditional open source and commercial relational databases use a standalone architecture that has limits on scalability. The traditional databases cannot meet the performance requirements in scenarios where large amounts of data must be stored. This gives rise to data warehouses such as Teradata and Oracle Exadata. These data warehouses are built on a distributed scale-out architecture.

Teradata and Oracle Exadata are both all-in-one database offerings and have specific requirements for hardware, which results in high costs. Teradata and Oracle Exadata are available for large enterprises in fields such as traditional finance, transportation, and energy. As Internet service providers such as Google gain a significant presence, big data technologies such as Hadoop that are based on the traditional x86 server architecture develop rapidly. Open source distributed databases such as Greenplum also appear as alternatives to these database offerings. This lowers the skill and cost barriers for small and medium-sized enterprises (SMEs) to analyze data. Meanwhile, distributed database technologies are further developed and popularized. Hadoop supports SQL interfaces based on MapReduce interfaces. The SQL syntax becomes part of the standard configurations for big data analysis systems.

As cloud service providers such as Amazon Web Services (AWS), Microsoft Azure, Alibaba Cloud, and Google emerge, cloud native distributed data warehouses become major solutions for data analysis, such as Amazon Redshift, Snowflake, AnalyticDB for PostgreSQL, and Google BigQuery. Cloud-native data warehouses originate from database and big data technologies and provide standard SQL interfaces and atomicity, consistency, isolation, durability (ACID) guarantee. The underlying storage of cloud-native data warehouses implements resource pooling and horizontal scalability by means of the Shared Everything or Shared Nothing architecture. Resource isolation and data sharing are common requirements for cloud-native data warehouses.

In summary, technology development trends of data analysis include the following aspects:

  • Cloud-native distributed architecture: Distributed databases have become the most essential technology for modern enterprises. According to a Gartner report entitled "The Future of the DBMS Market Is Cloud", cloud-native architectures and features have become a necessity for cloud databases. Standalone data storage can no longer keep up with the rapid growth of business and data in online transaction processing (OLTP) or online analytical processing (OLAP) scenarios.
  • Storage and computing separation: The essence of cloud computing is efficient resource pooling. The core components of databases are storage and computing. Storage and computing separation enables resource pooling and separate scaling of storage and computing resources. It meets the requirements for resource isolation and data sharing. Storage and computing separation emerges as a popular trend in architecture.
  • Computing and analysis integration: Traditional data analysis solutions regularly extract and synchronize data from OLTP to OLAP systems for quasi-real-time synchronization. However, this may cause complex deployment, poor real-time performance, data redundancy, and high costs. Ideally, a single hybrid transaction/analytical processing (HTAP) system is used for both computing and analysis.
  • Integration of databases and big data technologies: In early stages, big data technologies provided distributed data processing capabilities at the expense of consistency to improve the scalability of traditional standalone databases. Standard SQL interfaces are provided based on MapReduce interfaces, and some massively parallel processing (MPP) database technologies are also applied. Moreover, distributed databases have evolved to incorporate some big data technologies and storage formats to improve scalability. In terms of data analysis, both of the ways resolve the same issue.

Market trends

The rapid growth of data has imposed extensive requirements for data analysis. Between 2010 and 2025, the compound annual growth rate (CAGR) of data worldwide is expected to reach 27%, and that in China is expected to reach 30%. According to Gartner, live data is expected to account for 30% of all data by 2025. Unstructured live data is expected to account for 80% of all live data. Data that is stored in the cloud is expected to account for 45% of all data. Databases that are stored in the cloud are expected to account for 75% of all databases.

According to Global Market Insights reports, the data warehousing market size is estimated to grow at more than 12% CAGR worldwide and at more than 15% in China between 2019 and 2025. Market demands come from industries such as finance, Internet, manufacturing, government, and new retail.

Global data growthGlobal CAGR

Alibaba Cloud database services

Alibaba Cloud has been investing in database and data analysis technologies from its inception to provide services for business inside and outside Alibaba Cloud in a variety of scenarios and industries. After years of continuous investment and accumulation, Alibaba Cloud has been named a Leader in the Gartner Magic Quadrant for Cloud Database Management Systems for three consecutive years as of 2020.

Gartner quadrant

AnalyticDB for PostgreSQL provides core features for data analysis. The following figure shows the score ranking of AnalyticDB for PostgreSQL in the 2020 Gartner Critical Capabilities for Cloud Database Management Systems for Analytical Use Cases report.

Ranking 1Ranking 2