DataWorks integrates with Hologres so you can build a real-time data warehouse or ad hoc analysis system without managing complex infrastructure. From a single console, configure Hologres tasks, set up periodic scheduling, and manage metadata.
This topic covers the end-to-end development process: prerequisites, billing, environment preparation, permission management, data integration, task development and monitoring, data governance, analysis, and open platform integration.
Tip: If you want to automate workflows or trigger DataWorks tasks from external systems, the Open Platform modules (OpenAPI, OpenEvent, Extensions) let you do this programmatically — you don't have to rely solely on manual operations in the console.
Prerequisites
Before you begin, make sure you have:
-
An activated DataWorks instance. For details, see Activate DataWorks.
-
A DataWorks workspace. For details, see Create and manage workspaces.
Usage notes
| Area | Key consideration |
|---|---|
| Billing | Developing Hologres tasks in DataWorks incurs charges for DataWorks resources and for other Alibaba Cloud services used during development and execution. |
| Environment preparation | Purchase a DataWorks edition and a resource group, add a Hologres data source, and associate the data source with your workspace. |
| Permission management | Grant RAM users permissions on the Hologres compute engine, the data source, and related tables. Assign workspace-level roles before development begins. |
| Data integration | Read from and write to Hologres using batch, real-time, full, or incremental synchronization. |
| Data modeling and development | Use Data Modeling for end-to-end data governance, DataStudio for task development and scheduling, and Operation Center for monitoring and O&M. |
| Data governance | Manage Hologres metadata and govern Hologres data through Data Map, Security Center, and Data Governance Center. |
| Data analysis and services | Run SQL-based analysis and expose data through APIs using DataAnalysis and DataService Studio. |
| Open Platform | Integrate external application systems with DataWorks using OpenAPI, OpenEvent, and Extensions. |
Billing
Developing Hologres tasks in DataWorks generates two categories of charges: fees billed through DataWorks, and fees billed separately by other Alibaba Cloud services.
Fees for DataWorks resources
For a full list of billable items, see Overview.
| Fee | Description |
|---|---|
| DataWorks edition | Activating Standard Edition, Professional Edition, or Enterprise Edition is charged at the time of purchase. |
| Scheduling resources | Periodic task scheduling requires a serverless resource group (recommended) or an old-version exclusive resource group. A single serverless resource group covers both task scheduling and data synchronization. |
| Data synchronization resources | Synchronization tasks consume both scheduling resources and synchronization resources. Purchase a serverless resource group or an old-version exclusive resource group for Data Integration. |
Running tasks manually — by clicking Run or Run with Parameters in the DataStudio toolbar — does not incur scheduling fees. Failed tasks and dry-run tasks are also not charged.
For details on how scheduling fees are calculated, see Issuing logic of scheduling tasks in DataWorks.
Fees for other Alibaba Cloud services
These fees are billed by the respective Alibaba Cloud services, not through your DataWorks bill. Check the billing documentation for each service you use.
| Fee | Description |
|---|---|
| Database fees | Running data synchronization tasks that read from or write to databases may generate database fees. |
| Computing and storage fees | Running tasks on a compute engine — for example, a Hologres SQL task — may generate compute and storage charges. For Hologres pricing, see Billable items of Hologres. |
| Network service fees | Establishing network connections between DataWorks and other services using Express Connect, Elastic IP Address (EIP), or Internet Shared Bandwidth may generate network fees. |
Environment preparation
Before you develop Hologres tasks, purchase the right DataWorks edition and resource group, then connect your Hologres instance to a workspace.
Resource preparation
DataWorks Basic Edition supports core Hologres development workflows including data migration, task development, scheduling, and data governance. Upgrade to Standard Edition, Professional Edition, or Enterprise Edition for advanced governance and security capabilities.
| Item | Options | Reference |
|---|---|---|
| DataWorks edition | Basic Edition covers core Hologres development workflows. Upgrade to Standard, Professional, or Enterprise Edition for advanced governance and security. | Comparison between DataWorks Basic Edition and DataWorks advanced editions and edition upgrade description |
| Resource group | Serverless resource group (recommended): One group covers data synchronization, scheduling, and DataService Studio. Allocate resources based on workload. Old-version resource group: Meets basic scheduling needs but will be discontinued. | Overview of DataWorks resource groups |
Development environment preparation
Add a Hologres instance to your workspace as a data source, then associate it with DataStudio. Add team members to the workspace to enable collaborative development.
| Item | Description | Reference |
|---|---|---|
| Data synchronization environment | Add a Hologres instance to a DataWorks workspace as a data source before running synchronization tasks. | Associate a Hologres computing resource |
| Data development and analysis environment | Associate the data source with DataStudio to enable data development, analysis, and periodic scheduling. | Associate a Hologres computing resource; Preparations before data development: Associate a computing resource or a cluster with DataStudio |
| Collaborative development environment | Add RAM users to the workspace with the Development role. Grant each user permissions on the Hologres compute engine instance, the Hologres data source, and any databases they need to access in the production environment. | Add members to a workspace; Configure permissions on the Hologres compute engine for a workspace member |
Permission management
DataWorks uses two complementary permission systems that control access at different levels.
Data access permissions
To develop Hologres tasks as a RAM user in a DataWorks workspace, grant the RAM user:
-
Permissions on the Hologres compute engine instance
-
Permissions on the Hologres data source associated with the workspace
-
Permissions on related tables
For details, see Permission management for Hologres.
Service and feature permissions
Assign a workspace-level role to each RAM user before they begin development. Two authorization mechanisms are available:
-
RAM policy-based authorization: Controls permissions on DataWorks service modules (for example, restricting access to Data Map) and console-level operations (for example, allowing workspace deletion). See RAM policy-based authorization.
-
Role-based access control (RBAC): Controls permissions on workspace-level modules (for example, granting DataStudio access) and global-level modules (for example, restricting Data Security Guard access). See Role-based access control (RBAC).
For a step-by-step guide, see Best practices: Grant permissions to RAM users.
Getting started
DataWorks covers the full development lifecycle: integrate data, build and schedule tasks in DataStudio, monitor in Operation Center, govern data, analyze results, and expose data through APIs. Each stage is described below.
Data integration
DataWorks Data Integration reads from and writes to Hologres, supporting synchronization between a Hologres data source and other data source types. Choose the scenario that fits your workload:
-
Batch synchronization — for scheduled, large-volume data loads
-
Real-time synchronization — for continuous, low-latency data pipelines
-
Full synchronization — for initial complete data loads
-
Incremental synchronization — for ongoing delta updates
For an overview of all supported scenarios, see Data Integration overview.
Data modeling and development
| Module | What it does | References |
|---|---|---|
| Data Modeling | The first step for end-to-end data governance. Data Modeling structures enterprise data using the Alibaba data mid-end methodology — covering data warehouse planning, data standards, dimensional modeling, and data metrics — so teams can interpret and share business data consistently. | Data Modeling overview |
| DataStudio | Develop and schedule Hologres tasks without complex command lines. Supports Hologres SQL nodes, schema synchronization nodes, and general-purpose node types (Shell, Assignment, Branch, Do-while, For-each, and more) for handling complex logic. Also supports data synchronization between Hologres and other data sources — though only a subset of batch and real-time synchronization scenarios are available in DataStudio. See Data Integration for the full list. | Hologres development standards; Create a Hologres SQL node; Create a node to synchronize schemas of MaxCompute tables; Create a node to synchronize MaxCompute data |
| Operation Center | An end-to-end O&M and monitoring platform. View task status, run intelligent diagnostics, rerun failed tasks, and use the intelligent baseline feature to guarantee output timeliness for critical tasks. | Perform basic O&M operations on scheduled tasks |
| Data Quality | Monitors data quality throughout the R&D process using configurable rules that integrate with task scheduling. Detects and escalates data quality issues before they affect downstream consumers. | Data Quality overview |
After developing tasks in DataStudio, complete the following before deployment:
-
Configure scheduling properties — set scheduling dependencies and parameters to enable periodic runs.
-
Debug nodes — test tasks before deploying to avoid wasting compute resources in production.
-
Deploy to production — deployed tasks appear on the Auto Triggered Nodes page in Operation Center.
-
Manage nodes — deploy, undeploy, or modify scheduling properties across multiple tasks.
-
Apply process control — use code review, smoke testing, and custom review logic to standardize development and protect production data.
For details, see Overview, Task debugging process, Publish tasks, Batch operations, and Development process control.
Data governance
After you associate a Hologres data source with a DataWorks workspace, DataWorks automatically collects its metadata. View metadata in Data Map and governance issues in Data Governance Center.
| Module | What it does | Reference |
|---|---|---|
| Data Map | An enterprise-grade metadata management platform. Search, categorize, and understand data objects across the organization based on unified metadata services. | Data Map overview |
| Security Center, Data Security Guard, Approval Center | An end-to-end data security governance platform covering data asset classification, sensitive data identification, authorization management, data masking, access auditing, and risk response. | Security Center overview; Data Security Guard overview; Approval Center overview |
| Data Governance Center | Automatically identifies governance issues across multiple fields using rules derived from real-world data engineering experience. Provides pre-event issue prevention and post-event remediation. | Data Governance Center overview |
Data analysis and services
| Module | What it does | Reference |
|---|---|---|
| DataAnalysis | Run SQL-based analysis online, explore business data, edit and share datasets, and generate visualized reports from query results. | DataAnalysis overview |
| DataService Studio | Expose data through centrally managed APIs for both internal and external consumers. | DataService Studio overview |
Open Platform
Connect external application systems to DataWorks to manage data pipelines, govern data, run O&M operations, and respond to business changes programmatically.
| Module | What it does | Reference |
|---|---|---|
| OpenAPI | Call DataWorks API operations to integrate applications with DataWorks. Reduces manual operations and O&M effort, and minimizes data risk. | OpenAPI |
| OpenEvent | Subscribe to DataWorks change events so your applications can detect and respond to changes immediately. | OpenEvent overview |
| Extensions | Register local programs as extensions to manage event-driven processes triggered by DataWorks workspace events. | Extensions overview |
Appendix: Relationship between DataWorks and Hologres
The information in this appendix applies to workspaces in standard mode. If you use a workspace in basic mode, only the production environment is available, and only one Hologres database can be associated with the workspace.
DataWorks provides scheduling, metadata management, data governance, and security control for Hologres — but all data computation and storage happen in Hologres. In a workspace in standard mode, you can associate different Hologres instances with the workspace in the development and production environments, keeping storage and compute resources isolated between environments.
-
To learn how to add a Hologres data source and view instances in each environment, see Associate a Hologres computing resource.
-
To understand how DataWorks issues scheduled tasks to Hologres, see Issuing logic of scheduling nodes in DataWorks.