How to Ingest Data from Object Storage Service

Data ingestion from OSS with DataWorks, a tool for data ingestion, is user friendly and easy, can be done end to end using web-based approach, which enabled customers especially business users to do it quickly and simply, allowing them to focus their time and effort on more important tasks - running computation of big data.

In this article, we will show you how to perform data ingestion from Alibaba Cloud's Object Storage Service (OSS) with DataWorks.

After you have prepared the OSS bucket, you can follow the following procedure to integrate data from OSS.

Go to DataWorks and then Data Integration.
In Data Integration main page, press New Source to create data source sync from OSS.
Select OSS as data source.
Configure the OSS data source information.
After that, press test connectivity to check whether the OSS bucket can be connected from DataWorks. If it is successful, a green box will pop up at top right corner saying "connectivity test successfully".
In DataWorks Data Integration, click on Data Sources on left navigation panel and the newly created data source from OSS will be visible here.
Go to Sync Tasks at left panel in Data Integration.
Press Wizard Mode to setup data ingestion from OSS.
Configure Data Ingestion Source, Data Ingestion Target, Source and Target Column Mapping, Channel Control and Preview.
Name this data ingestion task to save. After saved, press "operation" button to initiate data ingestion from OSS.
Monitor the log at bottom panel to check the status of the data synchronization task. If the data synchronization ended with return code: [0], it means it is successful.

For more detailed information about how to prepare OSS bucket for data ingestion and the configuration in DataWorks Data Integration, please go to MaxCompute Data Ingestion from OSS.

Related Products

Alibaba Cloud Elasticsearch

Alibaba Cloud Elasticsearch is a cloud-based Service that offers built-in integrations such as Kibana, commercial features, and Alibaba Cloud VPC, Cloud Monitor, and Resource Access Management. It can securely ingest data from any source and search, analyze, and visualize it in real time. With Pay-As-You-Go billing, Alibaba Cloud Elasticsearch costs 30% less than self-built solutions and saves you the hassle of maintaining and scaling your platform.

DataWorks

DataWorks is a Big Data platform product launched by Alibaba Cloud. It provides one-stop Big Data development, data permission management, offline job scheduling, and other features. It supports data integration, MaxCompute SQL, MaxCompute MR, machine learning, and shell tasks.

Community

How to Ingest Data from Object Storage Service

Related Blog Posts

Drilling into Big Data – Data Interpretation (3)

Drilling into Big Data – Data Ingestion (4)

Related Documentation

Import or export data using the Data Integration

Create a Log Service source table

Related Products

Alibaba Cloud Elasticsearch

DataWorks

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

Data Integration