×
Community Blog How to Sync Up Data from MaxCompute to Greenplum with DataWorks

How to Sync Up Data from MaxCompute to Greenplum with DataWorks

We will show you how to use the data sync feature of DataWorks to synchronize data from MaxCompute to Greenplum, one of the most popular MPP databases.

By Jeffrey Gao, Solutions Architect

Alibaba Cloud DataWorks is the Big Data platform product launched by Alibaba Cloud, with the capabilities of one-stop Big Data development, data permission management, offline job scheduling, data integration (including data sync) and other features.

Today, we will demo how to use the data sync feature of DataWorks, to synchronize data, from MaxCompute, the most advanced big data platform of Alibaba Cloud, to Greenplum, one of the popular MPP database.

DataWorks supports multiple data source types to do synchronization. For more information, please refer to https://www.alibabacloud.com/help/doc-detail/73015.html

About Greenplum

Greenplum database is an open-source massively parallel data platform. It’s based on PostgreSQL and equipped with the analytical tools necessary to draw additional insights from your data. Greenplum’s massive parallel processing architecture provides automatic parallelization of all data and queries in a scale-out, shared nothing architecture.

Synching MaxCompute to Greenplum with DataWorks

  1. When the Greenplum instance is ready, we can use pgAdmin tool to login to manage the data. Before data synchronization, the table is empty.

    1

  2. We need to provision the data source properties, including source and destination. Since Greenplum is based on PostgreSQL, we can put it as PostgreSQL data source.

    2

  3. Then we set up a data sync task.

    3

  4. In data sync provisioning, we can provision the data source and destination, including the corresponding tables.

    4

  5. Then provision the mappings of fields and types between the source and destination.

    5

  6. When provision is done, we can execute the task and check the Runtime Log on the data synchronization status.

    6

  7. We can also login the Greenplum instance to check if data is already synchronized.

    7

    Furthermore, if we need this task be automatically executed periodically, we can provision the scheduling mode in the tab of Schedule.

    8

1 1 1
Share on

Alibaba Clouder

2,603 posts | 747 followers

You may also like

Comments

Raja_KT March 10, 2019 at 1:10 pm

Good one. Not sure if it is Greenplum or HybridDB for PostgreSQL :)