An Introduction and Best Practice of DataWorks Data Services

By Zhou Shuo, Product Manager of DataWorks

This article is a part of the One-stop Big Data Development and Governance DataWorks Use Collection.

1. Introduction

As an overall platform for big data development and governance, DataWorks has built a comprehensive process solution from data integration, data development, and data service to application development. According to the following figure, in the whole data process, data services connect data warehouses, databases, and data applications in series, forming a bridge between data and applications.

It can provide individuals, teams, and enterprises with comprehensive data opening and sharing capabilities by encapsulating data into data API. Users can use this platform to manage internal and external API services in a unified way. The data service provides an effective connection to connect data sources downwards and support business applications upwards.

(1) Overall Architecture of Data Services

In the overall architecture, the top layer contains three modules, the data service desk, OpenAPI, and API gateway. Among them, users can quickly create and publish the API through the product interface of the data service (data service desk) or OpenAPI or make secondary modifications to the API in the API Gateway. The lower layer of the architecture adopts the API development platform as the base. The data service development platform takes API as the core and provides the capabilities of organization management, API development, resource development, data source management, API permission management, and API measurement.

In organization management, users can create workflows and use workflows as logical units to store resource objects, such as API inside the workflow. This way, it can be subdivided further, and a multi-level management structure can be realized through folders.

In API development capabilities, besides diversified development methods, such as wizard mode and script mode, data services also provide interface debugging and one-click publishing capabilities, realizing the lightweight of API output. In addition, other capabilities assist API development and management, such as permission management (API visibility and authorization call) and measurement statistics (visualization charts to show the call situation).

DataWorks and API Gateway are integrated. API published in data services can be viewed and managed in the gateway list during development and management. In the call process, when the user side sends the call request to the gateway, the gateway will forward the call request to the data service background system. Then, the data service will perform request parsing, SQL parsing, and other processing and finally obtain valid data from the data source to return to the user.

From the preceding processes, in the overall process from the user sending the request to the data service returning the data result, the connection and processing process between different products (clients, API gateways, data services, and data sources) are almost unaware. Users only need to focus on the data itself. DataWorks data services encapsulate and provide a series of underlying services for users.

(2) Data Application Scenarios

The preceding architecture shows the capabilities, processing logic, and basic dependencies of data services from a global perspective, while the following picture shows the whole processes of API development and use by users with data services from a practical perspective.

First of all, two ways are used to create a new API on the data service platform. If you already have data sources, you can directly connect the data sources to generate an API by filling in the corresponding connection information. If you already have an encapsulated API, the host address of the existing API can be directly registered to the data service platform for unified control. In addition, the data service provides function compute capabilities to assist in API generation. It also supports organizing multiple APIs and functions into workflows to generate compound API.

Whether it is an independent API of the generated or registered type or a compound API of the service orchestration type, the data service can publish these APIs to the gateway with one click. You can call and consume the published data interfaces in applications, reports, and dashboards or put them on the Alibaba Cloud API market for sale and authorize others to realize internal and external data sharing. From this point of view, the application scenarios supported by data services are diversified and can meet your various needs.

2. Product Advantages of Data Services

In the traditional way, the processes of developing an API interface are complicated and time-consuming. Starting from preparing and connecting the database, you need to go through the steps of API query logic development, API authentication and flow control capability development, server construction, and interface deployment. After the interface goes online, you still need to carry out a series of later O&M work. In comparison, developing an API in a data service requires only two steps, preparing the database and configuring the query logic of the API. As for the subsequent deployment, control, and O&M work, users do not need to develop it alone. The data services will provide complete product capabilities, infrastructure, and underlying resources.

The data services adopt Serverless architecture, which has the advantages of zero code without O&M and flexible expansion. Users only need to pay attention to the query logic of API, which reduces the complicated process of developing API and achieves cost reduction and efficiency improvement.

In addition, if you need to create API and obtain data results, the connectivity of data sources and networks is essential. DataWorks data services also provide corresponding product capabilities to connect multiple data source types under multiple network environments. The network environments supported by Data Services include the VPC network, classic network, and Internet network. The supported data types include common ApsaraDB RDS and non-ApsaraDB RDS, such as MySQL, PostgreSQL, Oracle, OTS, and MongoDB, and big data storage types, such as MC-Hologres. Data services will also continue to enrich the types of accessible data sources.

In summary, the product advantages of data services say data source types are diverse, network types are rich.

3. Analysis of Main Features

You need to make preparations before using the DataWorks data services. Activate the API Gateway service to ensure that the API can be successfully published and a valid domain name can be obtained.

After preparations, you can enter the service development page of the data service, create a workflow as an organizational unit in the left directory, and continue to create the destination API and other objects (such as functions and service orchestration) under this workflow.

The following figure shows the common use processes in data services. Whether it is API generation, API registration, or compound API service orchestration, it can be published to the gateway for an online call as a data interface after debugging.

The core functions of the data services are described in detail below.

(1) Generate an API

Two approaches are used to generate APIs, namely wizard mode and script mode. If you are an analyst or a businessperson, the visualization wizard mode can be adopted. If you are a developer or a power SQL user, you can customize SQL scripts and edit complex query logic through script mode.

In wizard mode, after you select the destination data table name, the system automatically obtains the table structure and displays it on the API editing page. The overall logic of the API can be defined by checking the request parameters and return parameters. The wizard-mode generation API enables a quick single table query. It supports multiple operator types of request parameters (such as equivalent query and fuzzy matching) and sorting the returned results according to specific fields (such as adding a field to the sorted list). The advantages of using wizard mode are visualization and zero code and how easy it is to get started.

In script mode, data services offer an intelligent SQL editor that supports editing multi-table association analysis logic in SQL, adding aggregate function compute and other complex conditional queries. Notably, the advanced SQL mode covering Mybatis syntax has been officially published. Users can flexibly define dynamic label logic.

(2) Register an API

In addition to generating APIs, you can register the existing APIs to the data service platform, which is convenient for unified management, publishing, and docking. The registration API supports four common request methods: GET, POST, PUT, and DELETE. Three common data formats, FORM, JSON, and XML, are also supported.

(3) Function and Filter

In addition to API, data services provide another resource development capability, the Python function. You can develop Python scripts, bind to an API as a pre-filter or post-filter, and process the request parameters and return results of the API respectively, thus enhancing the logical expression ability of the API, adapting to various scenarios, and realizing data deformation and conversion.

(4) Service Orchestration

The service orchestration capability of data services can visualize several APIs and functions into a workflow in the form of drag nodes to realize serial and parallel calls among APIs.

For example, in the preceding figure, APIs, functions, and Switch conditional branch nodes are integrated to encapsulate a new overall API to provide to the business side. This approach can reduce network overhead and improve the overall API call performance. As shown in the figure, the organization logic of a service orchestration sample works like this: Several APIs are merged through workflows, and the output of the upstream API is used as the input of the downstream API. Then, the branch condition is judged according to different scenarios to obtain the final output.

(5) API Detail Page

After the API is published, the data services will automatically generate detailed API documents (the API detail page), eliminating the trouble of manually writing documents. The API detail page can be viewed by developers and callers. They are the version records and call instructions. The information contained in the API details page includes the basic information of the API, request and response parameters, normal and abnormal return examples, and error codes. This helps you understand the API in all aspects.

(6) API Authorization

Data services allow you to set call authorization for data interfaces to achieve secure and reliable data sharing.

In some scenarios, if the permission of the database and data table is directly exposed in multiple frequencies and on a large scale, it will lead to high data redundancy and affect data security. For such scenarios, the authorization capability of data services allows APIs to be called by users and others. The API authorization approach says enter the service management page, locate the destination API in the list, click authorization in the operation column, and select the cloud account ID and destination workspace to be authorized in the pop-up window. Then, set the validity period of the permission so the data can be opened and shared on a small scale but in a better method.

(7) API Call Authentication

In terms of API calling permissions, data services provide two methods: simple identity authentication and encrypted signature identity authentication. You can select authentication methods according to different scenarios.

The first method is simple identity authentication. Simple identity authentication uses AppCode authentication. You can copy the call address with parameters on the API detail page to obtain identity authentication parameters when an API is called. This call address contains AppCode information by default.

The second method is encrypted signature authentication. Encrypted signature authentication is performed using the AppKey and AppSecret encryption algorithms to calculate the signature, which has higher security.

The two types of call authentication information can be viewed on the detail page of the API call.

(8) API Measurement

Data services can measure statistics and calculate calling statistics over a specified period for published APIs, including:

Measurement Dashboard: It shows the total number of APIs, the total number of calls, the total execution duration, API Gateway status code distribution, data services error code distribution, service resources allocation, error rate, and call volume ranking list, providing a global overview of the API.
Measurement Details: It contains a single API monitoring chart, including the trend of API Gateway status code, data services error code, application request times, network traffic bandwidth, and response time. It helps users pay attention to key APIs in a timely manner.
Combined with SLS log services

In addition to the measurement statistics of the data services, users can print detailed API call logs combined with SLS log services.

Besides, APIs published by data services on gateways also support the configuration of corresponding flow control policies and alarm rules to provide effective escort for calling APIs on services lines.

Community

An Introduction and Best Practice of DataWorks Data Services

1. Introduction

(1) Overall Architecture of Data Services

(2) Data Application Scenarios

2. Product Advantages of Data Services

3. Analysis of Main Features

(1) Generate an API

(2) Register an API

(3) Function and Filter

(4) Service Orchestration

(5) API Detail Page

(6) API Authorization

(7) API Call Authentication

(8) API Measurement

References

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

API Gateway

Big Data Consulting for Data Technology Solution

Big Data Consulting Services for Retail Solution

OpenAPI Explorer