Before creating a project space for data development, you must configure the compute engine for your Dataphin instance. Once the compute engine is configured, the system allows the addition of corresponding compute resources to the project space, thus providing necessary computing and storage resources. This topic outlines the compute engine options available within the Dataphin system.
Permission description
Configuration of the compute engine is restricted to super administrators or system administrators only.
Billing description
To set up the real-time computing engine, you must purchase and enable the real-time module prior to configuration.
The agile development version does not support the configuration of the real-time computing engine.
Limits
-
Changing the compute engine type for the metadata warehouse after initial setup may result in incorrect metadata operations for the business tenant. Consult with the Dataphin operations team before making changes to the metadata warehouse compute engine type.
-
When altering offline computing engine settings, the system will synchronize the compute source configuration. The system does not check the connectivity of the compute source during this process. Ensure the accuracy of the configuration to prevent node failure. After changes, it is advisable to manually verify the compute source connectivity.
-
Configuration changes will take effect within 30 seconds. During synchronization, viewing the compute source configuration may display inconsistencies, and SQL execution may still use the previous settings.
Supported compute engines
Before utilizing Dataphin, complete the compute engine settings for your instance by configuring the compute cluster endpoint. Afterward, you can establish compute sources based on this cluster. Dataphin supports the following compute engines:
If no offline compute source exists, you may change the compute engine type and configuration in the calculation settings. If an offline compute source is present, only the calculation settings can be modified, not the compute engine type.
Once the metadata warehouse compute engine is initialized, only the supported compute engines for the current metadata warehouse tenant are selectable.
Compute Engine | Description | References |
Offline Computing Engine | ||
MaxCompute | Alibaba's native big data computing platform, offering massive data storage and processing capabilities with high efficiency and stability. | Configure the Dataphin instance compute engine to MaxCompute |
AnalyticDB for PostgreSQL | An OLAP-focused analytic database, offering a petabyte-scale, high-concurrency, real-time data warehouse with seamless scalability for extensive data computation. | Configure the Dataphin instance compute engine to AnalyticDB for PostgreSQL |
E-MapReduce3.x Hadoop and E-MapReduce5.x Hadoop | An open-source Hadoop cluster on Alibaba Cloud ECS, leveraging Alibaba Cloud E-MapReduce (EMR) for robust data processing. | |
CDH5.x Hadoop CDH6.x Hadoop | A globally recognized distributed system infrastructure, featuring HDFS and MapReduce for extensive data storage and processing. | |
A globally recognized distributed system infrastructure, featuring HDFS and MapReduce for extensive data storage and processing. | ||
Cloudera Data Platform 7.x | CDP represents the combined strengths of Cloudera's CDH and Hortonworks' HDP, post-merger. | |
Huawei FusionInsight 8.x Hadoop | A big data platform by Huawei, enhancing Apache open-source software for comprehensive data storage, query, and analysis. | |
AsiaInfo DP5.3 Hadoop | An integrated big data support platform, built on an open-source ecosystem and leveraging carrier-grade capabilities. | |
Transwarp ArgoDB | Transwarp ArgoDB is a distributed analytic database from Transwarp Technology. Note Transwarp ArgoDB is not supported by the intelligent development version. | Configure the Dataphin instance compute engine to TDH or ArgoDB |
Transwarp TDH 6.x | Transwarp Data Hub (TDH) is Transwarp's comprehensive big data platform. | |
StarRocks |
StarRocks is a high-performance analytic data warehouse, utilizing vectorization, MPP architecture, CBO, intelligent materialized views, and a real-time updatable columnar storage engine for multidimensional, real-time, high-concurrency data analysis. |
Use StarRocks as the metadata warehouse compute engine for initialization |
Lindorm (Compute Engine) | Lindorm is Alibaba Cloud's cloud-native multi-model database product, with a compute engine mode that supports offline big data applications. | |
GaussDB (DWS) | GaussDB (DWS) is a distributed relational database from Huawei. It is based on PostgreSQL and is compatible with Oracle, MySQL, and TeraData syntax. | |
Databricks | Databricks is a unified data analytics platform based on Apache Spark. It provides managed Spark clusters, an interactive notebook environment, and seamless integration with cloud storage to support high-volume data processing and large-scale analytics. | |
Amazon EMR | Amazon EMR is a managed Hadoop big data cluster platform that provides big data computing capabilities such as Hive and Spark. | |
SelectDB | SelectDB Enterprise is the commercial version of Apache Doris from SelectDB. | Set the Dataphin instance compute engine to SelectDB or Doris |
Doris | Apache Doris is a high-performance, real-time analytic database based on a massively parallel processing (MPP) architecture. | |
Real-time Computing Engine | ||
Alibaba Cloud Realtime Compute for Apache Flink | Alibaba Cloud's next-generation compute engine, Flink, supports real-time computing with high throughput and low latency, and is also capable of offline computing and scheduling. | Once the tenant enables the real-time development module, the system will suggest settings based on the offline computing engine selection, which can be customized. For instructions on enabling real-time development, see tenant settings. |
Apache Flink | Apache Flink is a distributed processing engine for stateful computations on both unbounded and bounded data streams. | |
FusionInsight Flink | FusionInsight Flink is a real-time computation and analysis engine for high-speed data streams, based on Apache Flink. | |
Blink Exclusive | Blink is Alibaba Cloud's exclusive real-time computing engine. Important This version is no longer available on the public cloud. Selection should be made with caution. | |