Building a super-large-scale graph database on the cloud

Recently, Hangzhou Yueshu Technology Co., Ltd. has reached a cooperation with Alibaba Cloud Computing Nest. NebulaGraph, as the first graph database product, has officially entered Alibaba Cloud Computing Nest, bringing users a new experience of deploying enterprise-level graph database clusters in the cloud. At the same time, the service integrates a number of database management tools for NebulaGraph peripheral visualization, enabling users to quickly gain insight from data in the cloud.

In recent years, the concept of graph database has been repeatedly mentioned by more and more enterprises. Graph is a data structure that stores entities and relationships between entities, while Graph Database is a database that uses graph data for storage and uses graph structure for semantic query.

The graph database can efficiently store the entities of the associated data as vertexes and the relationships as edges, and allow high-performance retrieval and query of these point-edge structures, as well as add attributes to these points and edges. Because of these characteristics, the graph database can store data in the form closest to intuitive cognition, and can present these relationships perfectly.

01 Development trend of database

Why use a graph database instead of a relational database?

The relational database realizes fast row-by-row access and data consistency (ACID transaction). However, when the data scale becomes large and the relationship between data becomes complex, the attributes of multiple tables need to be connected when using the relational model to retrieve, and the foreign key constraint needs to be taken into account when writing data, resulting in large additional costs and high performance requirements. The graph database has natural advantages in dealing with complex relationships, especially in the multi-to-many complex entity connection scenario with massive data, which is mainly reflected in three aspects of performance, flexibility and agility.

Relational database vs graph database (multi-hop query)

Secondly, graph database is based on graph model to store and display these relationships in an intuitive way. Because it is a model representation based on the relationship between things, the graph also has natural interpretability.

At present, graph data technology is being used in production environments and business practice scenarios in many industries due to its natural advantages in processing massive associated data and its display form that conforms to the intuitive impression of the human brain. For example, data integration (knowledge map), personalized recommendation, fraud and threat detection, risk analysis and compliance, identity (and control) verification, IT infrastructure management, supply chain and logistics, social network research, and even AI machine learning, NLP (natural language processing), blockchain and other emerging technology fields, you can see the figure of image data technology.

The trend of clouding is accelerating, and higher requirements are put forward for flexibility

According to Gartner's prediction, cloud services have maintained a rapid growth rate and penetration rate. A large number of commercial software is gradually changing from completely private local to cloud-based business model 10 years ago. One of the major advantages of cloud services is that it provides nearly unlimited flexibility, which also requires that various cloud-based software must have better capacity for rapid elastic expansion and contraction.

In this context, the open source distributed graph database NebulaGraph has become the first graph database partner in Alibaba Cloud's computing nest. Together with Alibaba Cloud, it helps more enterprise users better deploy the cloud graph database with low cost and high efficiency.

"For database products, the stability and security of the underlying infrastructure are crucial to customers. Alibaba Cloud is a leading cloud computing provider in China. We value its stable infrastructure and security. This cooperation with Alibaba Cloud Computing Nest can make our customers feel relieved and relieved by Alibaba Cloud Computing Nest's ability to more efficiently use the elastic expansion of underlying resources and convenient service arrangement Use NebulaGraph products based on cloud. " Said Ye Xiaomeng, founder and CEO of NebulaGraph.

What is cloud native graph database?

Cloud native, that is, the cloud capability generated on the cloud, is naturally "born in the cloud, grow in the cloud". Based on the unified architecture and cloud native infrastructure, it can realize the ability of multi-cloud/hybrid cloud solutions, edge cloud collaboration, etc. In the cloud native era, the way enterprises apply data has undergone fundamental changes, that is, the cloud native database and big data solution based on the unified cloud infrastructure will become the data base for enterprise digital transformation.

In the traditional mode, the enterprise purchases hardware resources, and the database is deployed in the self-built IDC and supervised and operated by the enterprise. Enterprise developers use DevOps mode or arrange IT personnel to control the database. After accessing the service, always pay attention to the database cluster status to ensure availability, which is a huge challenge for users who do not know much about the internal implementation of the database. As a technology carrier, cloud computing naturally has the advantage of spanning time and space. Cloud technology is developing from a single computing capability to systematic innovation. For enterprises, "going to the cloud" is a technological choice and the starting point of enterprise digitalization, while building a new production relationship and creating a new business growth engine based on the cloud is a strategic choice.

NebulaGraph is a reliable distributed, linear expansion and high-performance graph database. Its underlying architecture of sharing nothing and storage computing separation makes it have the characteristics of cloud native, which can effectively reduce costs and elastic expansion. While cloud deployment shields the complicated process of database deployment, performance tuning, operation and maintenance. A graph database can be created on the cloud in a few minutes, and computing, storage and other resources can be rapidly expanded.

As a high-performance graph database product capable of accommodating massive associated data and performing millisecond query delay, NebulaGraph has been applied by many industry-leading technology and communication companies in anti-fraud, risk control, community discovery and other scenarios, among which the NLP team of a head Internet company has also built its own graph database platform based on NebulaGraph. At present, more than 60 business lines have been connected and used in intelligent assistants Search recall and other business scenarios have been implemented.

The architecture diagram of NebulaGraph deployed on Alibaba Cloud

02 What are the benefits of deploying NebulaGraph on the cloud?

Out of the box: fast deployment is more convenient

Because cloud vendors provide a unified infrastructure, enterprises do not need to purchase hardware by themselves. They can also flexibly deploy cloud resources according to business flexibility and resource requirements to achieve rapid online. Based on the ROS (resource orchestration) provided by Alibaba Cloud Computing Nest, NebulaGraph has realized automatic deployment on the cloud, and can deliver a graph database cluster within a few minutes, which is greatly improved compared with the traditional delivery cycle based on days or even weeks.

Secondly, NebulaGraph Cloud supports the flexible payment method of monthly package and pay-as-you-go, which saves one-time construction costs such as newly purchased equipment and self-built computer rooms, and can be destroyed in time when it is no longer needed, significantly reducing the research and development costs. In order to further optimize the experience, in the next stage, NebulaGraph will also combine its own optimization features and test results on the cloud to successively launch cost-effective cloud server specification packages, so that users can obtain higher performance at a lower price. You may as well wait and see.

High availability: data backup is more secure

• High availability of architecture

The NebulaGraph cluster contains three types of services, namely Query Service, Storage Service and Meta Service.

1. Meta Service adopts the Leader/Follower architecture. The leader is selected by all Meta Service nodes in the cluster, and then provides external services; Followers are on standby and copy updated data from Leader. Once the Leader node is down, one of the Followers will be elected as the new Leader.

2. The process corresponding to Query Service is nebula-graphd, which is composed of completely equivalent, stateless and unrelated computing nodes. There is no communication between computing nodes.

3. The storage service is designed with the distributed architecture of Shared-nothing. There are three layers in total. The bottom layer is the Store Engine, which is a stand-alone version of the Local Store Engine. It provides the get/put/scan/delete operation of local data. This layer defines the data operation interface. Users can customize and develop the relevant Local Store Plugin according to their needs.

On top of the Local Store Engine is the CONSENSUS layer, which implements the Multi Group Raft. Each partition corresponds to a group of Raft Groups.

• Data reliability

NebulaGraph data storage uses Alibaba Cloud's cloud disk products. Cloud disk is a block-level block storage product provided by Alibaba Cloud for ECS. It is characterized by low latency, high performance, durability, and high reliability. The cloud disk uses the distributed three-copy mechanism to provide extremely high data reliability assurance for ECS instances.

Extreme elasticity: more reliable separation of storage and calculation

NebulaGraph uses an architecture that separates storage and computing. The separation of storage and computing has many advantages. The most direct advantage is that the computing layer and the storage layer can expand and shrink flexibly according to their respective conditions. The separation of storage and computing also brings another advantage: it makes horizontal expansion possible, and ensures its own expansion and contraction needs through the ultimate flexibility of the cloud.

NebulaGraph will create two elastic scaling groups during cluster deployment, one for Graph service and the other for Storage service. In order to better expand and shrink the experience, we have made the following considerations:

1. Give some control to the user. If it is a Graph service, you only need to scale the ECS resources elastically according to the target number, because Graph itself is stateless; If it is a storage service, we will provide a control switch on the orchestration page to determine whether to automatically balance data after elastic expansion. Because data migration will affect service stability during peak business hours, when to perform the balance operation can be completed by users themselves. The deployed Dashboard tool also supports this operation.

2. Security protection strategy. We all know that database service data is not lost is the first factor, so we have enabled the delete protection switch for the storage scaling group to prevent users from misoperation. In addition, when users need to shrink, they need to find the storage node under the management of the corresponding scaling group in the Dashboard interface to perform the data sharding removal operation, and only after the data sharding is cleared will they accept the elastic scaling cycle-hook request.

In addition, Alibaba Cloud's elastic scaling provides rich scaling rules and the ability to schedule tasks. Users can dynamically scale Graph nodes based on business fluctuations and application scenarios (OLTP or OLAP), and flexibly expand storage nodes based on average CPU usage.

Safe and reliable: role identity traceable

NebulaGraph supports strict role access control and LDAP (Lightweight Directory Access Protocol) and other external authentication services, which can effectively improve data security. When a client connects to NebulaGraph, NebulaGraph will create a session in which all kinds of connection information will be stored. If authentication is enabled, the session will be mapped to the corresponding user.

At the same time, NebuaGraph has built-in role permissions. Users can assign roles to the created users to achieve access control.

03 How to deploy NebulaGraph on the cloud?

At present, NebulaGraph and Alibaba Cloud Computing Nest are carrying out a time-limited free trial activity. Users who need to log in to Alibaba Cloud Computing Nest, first apply for trial permission, fill in the information and pass the approval, and then you can see a completely white-screen service creation page.

Fill in the necessary configuration parameters in the boot step, and wait for about 5 minutes after submitting with one click to experience the NebulaGraph, and then use the various functions of the graph data normally, which greatly reduces the threshold for users to deploy the database.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us