DataWorks is a Big Data platform product launched by Alibaba Cloud. It provides one-stop Big Data development, data permission management, offline job scheduling, and other features.
DataWorks works straight ‘out-the-box’ without the need to worry about complex underlying cluster establishment and Operations & Management.
The system’s Data Development module provides abundant visual components, including SQL (ODPS SQL), data synchronization, MR (ODPSMR), machine learning, shell, and other job types. Compared with open-source workflow drag-and-drop operations, it provides a more convenient and flexible experience and interaction.
Personalized Data Favorites
The system data management module provides personalized data favorites and management. You can easily add data tables of interest to favorites, manage the lifecycle, basic information and owner of a data table, and view the storage information, partition information, output information and kinship information of the data table.
One-click Job Publishing
Quick migration and publishing of jobs between different projects are provided under the same primary account. We provide a dual-environment model for customers simulating the ‘development’ and ‘production’ environments and more offline and online production models.
Visual Job Monitoring
The O&M Center provides a visual job monitoring and management tool and supports displaying overall job running conditions in a DAG. Exception management is also more convenient. Operations such as “rerun”, “restore”, “suspend”, and “stop” are supported.
DataWorks includes a visualized development interface, offline Operation & Management job scheduling, rapid data integration, and collaborative work. This provides an efficient and secure environment for offline data development. In addition, its powerful Open APIs create an outstanding ecosystem for redevelopment by application developers.
DataWorks is dependent on the massive data computing engine MaxCompute (previously known as ODPS, and independently developed by Alibaba Cloud) as its underlayer and provides features that are applicable to multiple scenarios, including offline processing, analysis, cloud data warehouse building, and data mining.
This product provides a visual business process designer that supports programming and debugging for multiple types of code. It provides functions such as code auto-completion, code formatting, code version management, and collaborative development.
DataWorks provides stable offline scheduling capabilities through its offline task scheduling with multiple time dimensions, online O&M, monitoring alarm, and other functions. This includes supporting the offline scheduling of millions of tasks.
This service covers the full range of data services, including data measurement, and influence analysis functions. It supports the management of metadata, heterogeneous data, service metadata, data life cycles, data assets, and data permissions.
DataWorks is typically used in the following scenarios:
DataWorks conveniently migrates data produced by the business system to the cloud, constructs large-scale data warehouses and BI applications, and leverages the massive data storage and computation capabilities of MaxCompute.
DataWorks's easy-to-use service and advanced data analysis capabilities can be used to directly apply exported data to the business system to achieve smart data-driven operations.
Data Presentation and Sharing
For complex job scheduling and O&M, DataWorks provides a unified and user-friendly scheduling system and visual O&M scheduling interface. This solves the problem of inconvenient O&M management.
Use Alibaba Cloud DataWorks Through the Management Console
The DataWorks Management Console provides access to complete all work online, including project management, member management, data analysis, and workflow scheduling.
DataWorks Product Documentation
To get started and learn how to use this product please refer to the Quick Start in the Document Center.
1. How do I get started with DataWorks ?
After you have registered an Alibaba Cloud account, you can login to the console and create a project. See reference material on how to Create Project.
2. How does a sub-user sign into Data Works?
Yes. See detailed reference to Add and Authorize Member.
3. How do I create a MaxCompute table?
You can use the New Table function in the New Script File and Data Management modules in the DataWorks to create a MaxCompute table. Detailed reference to Create and Delete Table.
4. Do you support UDF?
Support MaxCompute UDF. Detailed reference to Create UDF.
5. Do you support MapReduce?
MapReduce is not fully supported. OPEN MR is provided in order to help users use the ODPS MR feature in a safer and more convenient manner and implement more complicated computing logic. Detailed reference to Create OPEN MR.
6. Which Datasources are supported for data synchronization?
Currently data source types supported by data synchronization jobs include: MaxCompute, RDS (MySQL, SQL Server, PostgreSQL), Oracle, FTP, ADS, OSS, OCS, and DRDS. To complete data synchronization tasks, please refer to Create Data Sync Job.