Breaking the Limits of Relational Database in Cloud Native Era

The concept of relational databases may seem a little "antiquated" in this day and age as its history begins with the IT technology present half a century ago. Though, realistically, the technology has always been at the core of even modern society, driving on most of the developments in commercial technical civilization. The three core technical fields – CPU, operating systems, and databases – are the cornerstone of information processing, computing power, and artificial intelligence.

From the publishing of E.F. Codd's thesis "A Relational Model of Data for Large Shared Data Banks" in 1970 to the arrival of the DB2, the commercial relational database supporting SQL on the market in the early '80s, Oracle's start, and the birth of SQL-Server in the '90s, successes of relational databases span the decades.

Today, with the development of the World Wide Web and the broad application of big data, more and more new types of databases are cropping up. However, relational databases continue to dominate the space. One of the primary reasons for the prevalence of relational databases is their adoption of SQL standards. This advanced, non-procedural programming interface language perfectly combines computer science and human-comprehensible data management methods, and remains difficult to surpass even today.

SQL Language

SQL (Structured Query Language) was invented by Boyce and Chamberlin in 1974 to act as a bridge between relational algebra and relational calculus. In essence it is a language that uses key words that resemble natural speech and grammar to define operations on data, program data storage, query, and management.

This abstract programming interface decouples the specific data problem from the details of data storage and query implementation, allowing large swaths of commercial business logic and information management computing models can be mass applied. This has released production power and significantly driven forward the development of commercial relational database systems.

Looking at the continued development and growth of SQL, it's not hard to see why it has already become the top choice in the world of relational databases. Even today, this programming language still has yet to be replaced by an alternative.

OLTP

In 1976, Jim Gray published a thesis called "Granularity of Locks and Degrees of Consistency in a Shared Database" in which he formally defined the concept of database transactions and data consistency mechanism for relational database events. OLTP, is a classic application of relational databases which involves event processing, primarily basic, even daily processes such as transactions in a bank.

Event processing must follow ACID, four principles that ensure data accuracy. ACID stands for Atomicity, Consistency, Isolation, and Durability. Performance indicators used to measure the processing power of OLTP include response time and throughput.

The Open Source Database Ecology

In our brief overview of the history, position, and developmental phases of relational databases, we came across the names Oracle, SQL-Server, DB2, all of which are relational databases that still hold the top positions in global databases. Though they were once household names in the tech world, Informix and Sybase have already fallen out of the awareness of the general public.

However, beginning in the 1990s, a renewed wave of information sharing and the spirit of free and open software became a popular trend, bringing with it names like Linux, MySQL, PostgreSQL, and other open source software. The appearance of this trend and the strength with which it has grown have released a veritable tsunami of growth in society as these freely shared technical advances encourage massive growth in global Internet technology companies.

This progress belongs to the whole of society, but the credit belongs to those pioneering open-source developers, Richard Stallman, Linus Torvalds, Michael Widenius, and the like. Of course, more and more Chinese companies have become active participants in the open source community over the past few years, also freely contributing their own technological advancements with the rest of the open source world.

The Current State of Cloud Computing

In reaction to the popularity of green computing and the shared economy, we need not just cloud servers, cloud data, networks, hardware chips, and other integrations of hardware and software, but also to continue putting the needs of users at the center of technology. With service that is focused on the user, technology will spread across public consciousness and further drive forward the development of computing efficiency and intelligence.

We currently exist in a phase of vigorous development of the so called "Cloud 2.0". This phase has seen the rise of a number of issues relating to the management of relational databases. It was precisely why AWS, Amazon's cloud computing unit, published Aurora on November 12, 2014. Launched at the AWS re:Invent 2014 conference, Aurora is a new-generation cloud-hosted relational database. The release of this new generation of databases heralds a new phase not only in the age of cloud computing but in the evolution of core technologies given to us by the IT era.

In 2017 in the SIGMOD data conference, Amazon published a thesis entitled "Amazon Aurora: Design Considerations for High Throughput Cloud Native Relational Databases", which further explained how the relational database based on the cloud environment design called Cloud-Native was born.

Click here to read more.

Related Blogs

PolarDB: Deep Dive on Alibaba Cloud’s Next-Generation Database

Cloud computing has provided more computing capability, and more creative power, to propel the Internet era. Relational databases are something few applications can do without. Cloud databases that are ready to use out of the box and feature high performance to cost ratios have found favor among developers all over the world.

Problems with Traditional Databases

Early versions of the MySQL database were optimized for early systems/hardware, but they didn’t take into consideration the kinds of systems/hardware that are becoming popular now. Therefore they leave a lot to be desired in high concurrency situations. Furthermore, unlike other relational databases, for the sake of compatibility, MySQL needs to write two backup logs (a task log and a copy log) which lowers its performance in comparison to other commercial databases. The above complaints all come from real customer cases, so to put it simply, the underlying structure of traditional cloud databases give rise to the following problems:

1.Read/write instances and read-only instances each have their own independent copy of the data, so when the customer purchases a new read-only instance, he needs not only to pay the computing costs, but also needs to purchase the appropriate storage resources.

2.Since traditional backup techniques also involve copying data and uploading to cheap storage, the speed of the operation is bottlenecked by the speed of the network.

3.Since both read/write instances and read-only instances each have their own copy of the data, creating a new read-only instance also involves re-copying all of the data, so when we take into consideration the limited speed of data flow across the network, the operation will inevitably be slow.

4.Early versions of the MySQL database were optimized for early systems/hardware, but they didn’t take into consideration optimizations for the kinds of systems/hardware that are becoming popular now. Therefore they leave a lot to be desired in high concurrency situations. Furthermore, unlike other relational databases, for the sake of compatibility, MySQL uses two logs (task log and copy log), which hurts its performance in comparison to other commercial databases.

5.Because of the limits of physical disks and backup strategies, the size of the database can't be too large without making O&M a disaster.

6.Read/write instances and read-only instances synchronize through incremental logic data, so all of the SQL in a read/write instance needs to be re-executed on read-only instances (including steps like SQL parsing and SQL optimizations). At the same time, the concurrency of copy reads is based on table dimensions, which affects all kinds of task switching.

As the database grows, so do these “small” annoyances which can plague DBAs and CTOs. Today, all of these problems that have tripped us up for years are all solved in Alibaba Cloud’s new PolarDB. Note that these issues are solved from the root of each problem, not just hacked together solutions.

What is PolarDB?

PolarDB is the next-generation relational database based on the cloud computing framework. Currently PolarDB only supports MySQL with PostgreSQL, which is under development. The most notable features are as follows:

Data backup time on PolarDB has been reduced to mere seconds. With the help of the excellent RDMA network and the newest block storage technology, the backup time is unrelated to the size of underlying data.
All of the nodes in an instance, including read/write nodes and read-only nodes, are able to access the same copy of data on a storage node. However, the traditional cloud database model only allows each instance to get its own copy of data.
Divide the nodes into computing nodes and storage nodes. The computing nodes are servers that primarily perform SQL parsing and storage engine computation. The storage nodes are servers that perform data block storage and data snapshots.

With these features, PolarDB satisfies both the elastic expandability needs of public cloud computing environments and the high availability needs of the database server for users on the Internet. The expansion time of read-only instances is no longer related to data size and the service can now continue even in the time between a server crash and restart.

PolarDB also features a complete management system based on Docker to handle instance creation, deletion, and account creation tasks passed down by the user. It also includes a complete and detailed monitoring system and reliable, high availability switching. The management system also maintains a set of metabases used to record the locational information of of each data block, which it provides to PolarSwitch which then passes it on to the appropriate destination. It can be said that the entire PolarDB project uses several new technologies to provide users with fast (6x the performance of MySQL) performance, large capacity (up to 100 TB), and cheap resources (about 1/10 the cost of other commercial databases).

Breaking the Limits of Relational Databases: An Analysis of Cloud-Native Database Middleware (1)

This article provides an in-depth insight into cloud-native database technology, focusing on the core functions and implementation principles of transparent sharding middleware.

The development and transformation of database technology is on the rise. NewSQL has emerged to combine various technologies, and the core functions implemented by the combination of these technologies have promoted the development of the cloud-native database.

This article provides an in-depth insight into cloud-native database technology Among the three types of NewSQL, the new architecture and Database-as-a-Service types involve many underlying implementations related to the database, and thus will not be elaborated here. This article focuses on the core functions and implementation principles of transparent sharding middleware. The core functions of the other two NewSQL types are similar to those of sharding middleware but have different implementation principles.

Sharding

Regarding performance and availability, traditional solutions that store data on a single data node in a centralized manner can no longer adapt to the massive data scenarios created by the Internet. Most relational database products use B+ tree indexes. When the data volume exceeds the threshold, the increase in the index depth leads to an increased disk I/O count, the substantially degrading query performance. In addition, highly concurrent access requests also turn the centralized database into the biggest bottleneck of the system.

Since traditional relational databases cannot meet the requirements of the Internet, increasing numbers of attempts have been made to store data in NoSQL databases that natively support data distribution. However, NoSQL is not compatible with SQL Server and its ecosystem is yet to be improved. Therefore, NoSQL cannot replace relational databases, and the position of the relational databases is secure.

Sharding refers to the distribution of the data stored in a single database to multiple databases or tables based on a certain dimension to improve the overall performance and availability. Effective sharding measures include database sharding and table sharding of relational databases. Both sharding methods can effectively prevent query bottlenecks caused by a huge data volume that exceeds the threshold.

In addition, database sharding can effectively distribute the access requests of a single database, while table sharding can convert distributed transactions into local transactions whenever possible. The multi-master-and-multi-slave sharding method can effectively prevent the occurrence of single-points-of-data and enhance the availability of the data architecture.

Vertical Sharding
Vertical sharding is also known as vertical partitioning. Its key idea is the use of different databases for different purposes. Before sharding is performed, a database can consist of multiple data tables that correspond to different businesses. After sharding is performed, the tables are organized according to business and distributed to different databases, balancing the workload among different databases, as shown below:

Vertical sharding

What Does the Future Hold for Next-Generation Cloud Database Technology in the Cloud Native Era?

This article introduces the challenges emerging in the cloud native era, and discusses how database technologies should adapt to face these challenges.

We are now in an all-cloud era, full of new technologies, innovations, and challenges. More importantly, we now face some important questions that can redefine the way we deal with database technologies. What reforms will be made in the database market in this era? How can cloud service providers offer more efficient and cost-effective database solutions to help more enterprise users seize opportunities presented by cloud migration?

In the database session of 2019 Alibaba Cloud Summit held in Beijing, Feifei Li, Vice President of Alibaba Group, Chief Database Scientist of Alibaba DAMO Academy, and Head of Database Business Group of Alibaba Cloud Intelligence, gave an insightful presentation on the next-generation cloud-native database technologies and the challenges they face.

Database Development and Technical Evolution

According to a database market analysis report released by DB-Engine in January 2019, relational database products are still dominant in the database market. Meanwhile, more market segments, such as the graph database, document database, and NoSQL database segments, are forming in the database market. Another trend in this market is the continuous decline of the market shares of the traditional commercial database giants. By contrast, open-source and third-party database market shares keep expanding.

After over 40 years of evolution, database technology is still developing vigorously. Cloud computing vendors have reached a consensus that databases are an important component in the connection of IaaS and intelligent cloud applications. Therefore, the vendors need to improve their capabilities throughout the entire data lifecycle, including data production, storage, and consumption, enabling users to connect IaaS and intelligent applications.

Thanks to the constantly developing database technology, we now have online transaction processing (OLTP) systems to record real-time transaction data and online analytical processing (OLAP) systems to analyze massive amounts of data in real time. OLTP and OLAP systems require the support of database services and management tools. Given these circumstances, NoSQL database solutions have been developed to store semi-structured and non-structured data.

From the late 1970s to early 1980s, relational databases came into being, and later the SQL query language and OLTP systems were developed. The explosive growth in data volumes and the demand for complex data analysis gave rise to data warehousing, as well as OLAP, extract-transform-load (ETL), and other data processing technologies. With the continuous increase of multi-source heterogeneous data, such as graphs, documents, spatial-temporal data, and time series data, non-relational NoSQL and NewSQL database systems have also emerged.

What Type of Database Do We Need in the Cloud Native Era?

Traditional databases typically use a single-node architecture, whereas cloud-native databases usually use a shared storage architecture. Alibaba Cloud PolarDB establishes a shared storage architecture over a high-speed network. This architecture separates storage from computing to enable the fast scale-out of computing nodes. In addition, PolarDB allows for the rapid scaling of storage and computing capabilities based on the actual needs of customers. Customers can use this shared storage database to complete a non-intrusive data migration without any change to the original business logic.

In addition to the cloud-native shared storage technology, a distributed architecture is required to handle highly concurrent access to massive amounts of data. For example, Alibaba is exploring the use of a distributed architecture to cope with the challenges posed by Double Eleven every year. Also, Alibaba Cloud wants to provide different query interfaces, such as SQL, to support queries of data in multiple models and states. Concerning the storage system, Alibaba Cloud hopes to allow users to store their data in different locations and use a unified interface like SQL to query all types of data. Alibaba Cloud Data Lake Analytics (DLA) is a cloud-native technology developed for this application scenario.

Traditional solutions isolate read and write conflicts by using the OLTP system to process transactions and the OLAP system to analyze huge volumes of data. In the cloud native era, Alibaba Cloud will minimize the cost of data migration by taking advantage of the technical benefits delivered by new hardware devices. This can be done by integrating transaction processing and data analytics features in one engine so that these two needs can be addressed seamlessly by one system.

Alibaba Cloud serves a large number of enterprises, which use our cloud resource pools based on a virtualized architecture that separates storage from computing. Therefore, we need to monitor and schedule all off-premises resources in an intelligent way to quickly respond to customer requirements and deliver optimal service quality. To achieve the necessary intelligence, we need to use machine learning and AI to enable automatic sensing, decision-making, recovery, and optimization in all sectors, including data migration, data protection, and elastic scheduling.

Related Courses

Operate and Manage a Relational Database on the Cloud

Learn about the basic concepts of RDS, the benefits of using it (compared to conventional database solutions) and understand its key features. It also includes demos that further introduce database/account management, security settings, read-only instances, database backups and third-party tool integration.

How to Build and Manage a Distributed Database System with DRDS

The distributed database solves the problems of traditional database such as capacity bottlenecks, difficulty in expansion, and high cost .etc. However, the configuration and management of distributed databases require higher technical capabilities. Alibaba Cloud's DRDS service helps you solve the problem of creating and managing distributed databases on the cloud. Through this course, you will understand the benefits and features of DRDS and how to use DRDS console to easily build and manage distributed database systems on Alibaba Cloud.

Alibaba Cloud Relational Database Service Technical Operations

Alibaba Cloud Relational Database Service is our cloud database offering. Through this course, you will not only learn about Alibaba Cloud Relational Database Service design architect and applicable scenarios, but also by watching product console demos, you will be familiar with Relational Database Service major functions and operation details.

Related Market Products

Operate and Manage a Relational Database on the Cloud

Postgres Pro Standard Database 10 (CentOS 7)

Postgres Professional is a key contributor to PostgreSQL community.
At Postgres Professional we develop Postgres Pro Database, a private PostgreSQL fork.
Each new Postgres Pro Database version is the latest PostgreSQL database version with patches committed to PostgreSQL community and Postgres Pro extensions/patches which are open source in most cases.

Limits

You can only create DRDS databases on the console, DRDS does not support creating databases with SQL commands.

Procedure

Log in to the DRDS console.
Click Instance List in the left navigation pane.
Find the target instance, and click the instance name to enter its Basic Information page.
At the upper right corner of the Basic Information page, Click Create Database.
In the dialog box that pops up, select Partition Mode according to your needs and enter the basic database information.

Connect to a database cluster

This topic describes how to use DMS and a client to connect to a PolarDB cluster compatible with Oracle.

Prerequisites

You have created a privileged account or standard account for a database cluster. For more information, see Create a database account.

Use DMS to connect to a PolarDB cluster compatible with Oracle

Data Management (DMS) provides an integrated solution for data management. DMS supports data management, schema management, access control, BI charts, trend analysis, data tracing, performance optimization, and server management. DMS supports relational databases such as MySQL, SQL Server, and PostgreSQL, as well as NoSQL databases such as MongoDB and Redis. DMS also supports the management of Linux servers.

Log on to the ApsaraDB for PolarDB console.
Find the target cluster and click the cluster ID to go to the Overview page.
In the upper corner of the page, click Log On to Database to go to the Database Logon page.
On the Database Logon page, enter the primary endpoint and the port number, and separate them with a colon (:). Then enter the username and password of the privileged or standard account, and click Log On.