Performing Daily Incremental Upload from OSS to MaxCompute Using Data Integration

This tutorial describes how we can easily import data from OSS into MaxCompute on a daily basis with Data Integration.

By Jonathan Peng, Staff Solutions Architect

Global businesses are facing increasing complexity and market volatility amid today's fierce competition. In response to this, all business functions are turning to data-driven strategies as a means to manage this increasing uncertainty. A data-driven approach also helps organizations better understand their customer bases and allows them to grow their businesses. Growth in digital technologies has given organizations the ability to analyze more data, even in real time. This in turn has generated more and more data to help fuel enterprises' needs.

However with this increase, there needs to be an effective way of storing large amounts of data. Nowadays, most organizations would use cloud solutions, such as Alibaba Cloud's Object Storage Service (OSS), as a data storage, data lake, and for data backups. In some cases, an organization may put all their Internet of Things (IoT) data into a file format and store it in the cloud for backup, as well as using it for historical data analysis. So, how can we devise a solution to import data from OSS into MaxCompute on a daily basis in an easy way?

Incremental Synchronization of OSS Data

This scenario allows you to partition easily based on the data generation pattern because the data remains unchanged after being generated. Typically, you can partition by date, such as creating one partition on a daily basis.

Generate the data with the name "IOTDataSet"+"date".csv for each date and upload it to OSS bucket. Here we have created a sample file named "IOTDataSet20180824.csv" and uploaded it to OSS. The format of the date for your data should be in yyyymmddhhmmss, which specifies the scheduled time (Year Month Date Hour Minute Second) for the routinely scheduled instance by Data Integration.

Walkthrough

Upload IOTDataSet20180824.csv to OSS as below.

Then, open the DataWorks console and navigate to Data Source. Detailed steps are described here: https://partners-intl.aliyun.com/help/doc-detail/98133.htm

Add data source in Data Integration.