Configure Hadoop as Dataphin Compute Engine via EMR or CDH - Dataphin

Before initiating data development in a project space, it is necessary to configure the Dataphin instance's compute engine. Once configured, the system allows the addition of corresponding compute resources to the project space, thus providing essential computing and storage capabilities. This topic describes the steps to set Hadoop as the compute engine in the Dataphin system.

Prerequisites

The system metadata initialization must be complete. For more information, see using Hadoop as the metadata warehouse compute engine for metadata warehouse initialization.

Procedure

Log on to the Dataphin console with a super administrator account.
Navigate to Management Center > System Settings from the top menu bar on the Dataphin home page.

In the Compute Settings section, select a Hadoop type compute engine and configure its parameters.

Hadoop-based compute engines include Alibaba Cloud E-MapReduce 3.x, Alibaba Cloud E-MapReduce 5.x, and CDH 5.x.

Hadoop, CDH6.x, Cloudera Data Platform 7.x, and Huawei FusionInsight 8.x

Hadoop and AsiaInfo DP5.3 Hadoop.

Note

If you set the compute engine to Aliyun E-MapReduce 3.x Hadoop, Aliyun E-MapReduce 5.x Hadoop, CDH 5.x Hadoop, CDH 6.x Hadoop, Cloudera Data Platform 7.x, AsiaInfo DP5.3 Hadoop, or Huawei FusionInsight 8.x Hadoop, you only need to select the offline computing engine type. You do not need to configure the other parameters. After you save the compute engine type, click Configure Compute Cluster to add or configure a Hadoop cluster on the Planning > Compute Source > Manage Hadoop Cluster page.

Parameter	Description
NameNode	Hadoop allows the addition of multiple HDFS connection URLs within the same cluster. Metadata acquisition is successful as long as one URL is verified. For example, `host=192.xxx.xx.xxx,webUiPort=500xxx,ipcPort=80xx`.
Execution Engine	Choose the compute execution engine that best fits your business needs.

Click Save.

What to do next

Once the compute engine for the Dataphin instance is configured, the system enables the addition of corresponding compute resources to the newly created project space, thereby providing necessary computing and storage capabilities. For guidance on creating a project space and incorporating compute resources, see creating a general project.