×
Community Blog Cloud Fighters Meet up: Big Data on Alibaba Cloud

Cloud Fighters Meet up: Big Data on Alibaba Cloud

Nowadays, data can be valuable to many different businesses. Any business may utilize data for analyzing real facts and gain insights based on data they have collected or acquired.

By Aaron Handoko and Rifandy Zulvan, Solution Architects of Alibaba Cloud Indonesia

Nowadays, data can be valuable to many different businesses. Any business may utilize data for analyzing real facts and gain insights based on data they have collected or acquired. Decision-makers can use insights from data as a decision support system for making data-driven decisions.

According to explodingtopics.com, the amount of data that are generated in the year 2023, approximately 328.77 MILLION TB, and the average amount of each data that are used by one person per second is 1.7 MB / sec.

The high amount of data is called Big Data, but Big Data not just about the amount or the volume. Big Data has five characteristics, they are:
1. Volume
The amount of data
2. Velocity
The speed how data are generated, transferred, or captured
3. Variety
The variety of data types that we use
4. Veracity
The fact and the truth of the data
5. Value
The value that can be used from the data

We understand that Big Data is not just about the volume, it’s also can be useful. But that raises a question in our heads, but how do companies use it? Businesses can utilize data to learn more about their clients and develop strategic plans. For advance, businesses can use the data to develop AI and machine learning models that automate tasks for their operation. However, businesses have some challenge to manage them. For instance, how they are able to manage high amount of data, how can they scale the data, how to analyze that data, and the security of the data itself.

In November 2023, Alibaba Cloud Indonesia held the first partner community gathering meet-up event, Alibaba Cloud Digital Tech Ecosystem Community Gathering, to discuss Big Data Platform in Alibaba Cloud. Alibaba Cloud has an end-to-end Big Data platform that can be used for data ingesting, data storing, data analysist or processing, data visualization and to integrate it with Machine Learning or AI.

image

In this meet-up participants received high level solutions on Alibaba Cloud Big Data Platform capabilities. Here they discuss more details on Alibaba Cloud Big Data Platform, best practices, and also hands-on implementation.

Big Data Workshop

Alibaba Cloud conducted two days hands-on workshop where participants were able to learn about the importance of Big Data and how to operate Alibaba Cloud Big Data products. Participants were given the chance to experience Alibaba Cloud hands-on lab called labex that can be accessed here and also step by step guidance to achieve each goal of the lab. In total there are three labs that Alibaba Cloud provide includes:

  1. Synchronizing Data From MySql to DataWorks here
  2. Data Development with MaxCompute & Dataworks here
  3. Use DataService Studio of DataWorks to Publish APIs here

Day 1 of the workshop, a brief introduction of Alibaba Cloud products was presented, highlighting the explanation of each product, best practices of each product, including when to use certain products, when not to use them and also what are some capabilities of the products. We also discuss some big data use cases including once a day reporting use case (batch job) and real time processing use case (stream job). Participants were also keen to discuss about the comparison and key differences of Alibaba Cloud Big Data products compared to the other cloud vendors. After brief introduction, we moved to the first hands-on lab.

Lab 1: Synchronizing Data from MySql to DataWorks

The objective of this lab is to synchronize data from MySQL on ECS to MaxCompute (Alibaba Cloud Data Warehouse) using easy to use platform called Dataworks. Here is the architecture of the lab:

image

We can see that using Dataworks we can simplify the process of synchronizing data from source to a data warehouse. We only need to define the data source of target database and also the destination database. Then using the UI tool of Dataworks we simply need to drag and drop the node and choose the expected tables that need to be synchronized. We also learned that Dataworks able to do ETL while synchronizing the data.

Lab 2: Data Development with MaxCompute & Dataworks

Compared to Lab 1, this lab is more advanced as there are multiple sources needs to be synchronized and also there are data merging and data mart creation process. To put it simply, lab 2 is an end-to-end Big Data batch data processing in Alibaba Cloud. The objective of this lab is to create data mart (ready to use data for reporting) that was synchronized from MySQL on ECS and textfile from OSS to MaxCompute (Alibaba Cloud Data Warehouse) using Dataworks. Here is the architecture of the lab:

image

Similar to Lab 1, we first need to define all the data sources needed to synchronize data including source database (MySQL on ECS and Textfile on OSS) and target database (MaxCompute). For MySQL we can use the public endpoint connection. After all the data sources were made, we need to get two data integration nodes for synchronizing data from OSS and MySQL. Details on how to create the data integration nodes can be seen in the lab.

After data is synchronized into our data warehouse in MaxCompute we can move to merge data (creating consolidated data) and create data mart. This can be done using the SQL node that MaxCompute and Dataworks provide. With the SQL node we can write our own SQL logic and also schedule them to run on certain time period. In this lab we also learnt how to define our own function or as we call it User Defined Function (UDF) and use it inside the Dataworks. From this lab we can also learn how to monitor each job using Operation Centre feature in Dataworks.

Lab 3: Use DataService Studio of DataWorks to Publish APIs

This lab introduces the participants to not only synchronize data internally but also create an API to access data publicly. In Dataworks, we have a feature called DataService Studio where we can create an API easily. The objective of this lab is to publish data from MaxComput to API using two methods named wizard mode and script mode.

The architecture of Lab 3 can be seen below:

image

Similar to lab 1 and 2 we have existing data in MySQL on ECS, hence we need to define MySQL data source in Dataworks. After it is configured, we can go straight to DataService Studio in DataWorks and choose the database and table that we would like to access through API. We can define the parameters that will be used in the API call to filter the exact data that is needed.

From the Cloud Fighters Big Data hands-on-lab we were able to get to know more about Alibaba Cloud Big Data products and get the first-hand experience on how to operate it.

0 0 0
Share on

Alibaba Cloud Indonesia

91 posts | 12 followers

You may also like

Comments

Alibaba Cloud Indonesia

91 posts | 12 followers

Related Products