All Products
Search
Document Center

DataWorks:Prerequisites: Cluster info and network config

Last Updated:Mar 26, 2026

Before registering a Cloudera's Distribution Including Apache Hadoop (CDH) or Cloudera Data Platform (CDP) cluster in DataWorks, collect the required configuration from Cloudera Manager and establish network connectivity between the cluster and your DataWorks resource group.

Background information

  • CDH is an open source platform distribution from Cloudera. It provides out-of-the-box features such as cluster management, monitoring, and diagnostics. It also supports various components to help you run end-to-end big data workflows.

  • CDP is a public data platform that collects and integrates customer data across platforms. It helps you collect real-time data and use it to build individual user data profiles.

Prerequisites

Before you begin, ensure that you have:

  • A CDH cluster deployed in an environment that can connect to an Alibaba Cloud virtual private cloud (VPC). DataWorks supports CDH clusters not hosted on Alibaba Cloud Elastic Compute Service (ECS) instances. Use Express Connect or VPN to bridge your on-premises or third-party network to an Alibaba Cloud VPC

  • A DataWorks resource group. Two options are available:

    • Serverless resource group (recommended): supports both data synchronization and task scheduling. New users — those who have not activated DataWorks in the current region — can only purchase Serverless resource groups. For setup instructions, see Use a Serverless resource group

    • Exclusive resource group for scheduling: an older resource group type that also supports CDH and CDP tasks. For setup instructions, see Use an exclusive resource group for scheduling

After purchasing a resource group, it has no network access to external services by default. The steps in this topic connect it to your CDH cluster.

Collect CDH cluster configuration

Follow these steps to gather the information needed to register the CDH cluster in DataWorks.

Step 1: Get the CDH version

Log on to Cloudera Manager. On the main page, the CDH version is displayed to the right of the cluster name.

CDH version info

Step 2: Get host and component addresses

Log on to Cloudera Manager. From the Hosts drop-down menu, select Roles. Identify each service by its keyword and icon, then record the corresponding host address.

Component roles in Cloudera Manager

The following components require host addresses during cluster registration:

Abbreviation Component
HS2 HiveServer2
HMS Hive Metastore
ID Impala Daemon
RM YARN ResourceManager

Step 3: Download the configuration file

  1. Log on to Cloudera Manager.

  2. On the Status page, click the cluster's drop-down menu and select View Client Configuration URL.

    View Client Configuration URL option in Cloudera Manager

  3. In the dialog box, download the configuration package. This example uses YARN.

    Configuration package download dialog

You will upload this file when you register the CDH cluster.

Step 4: Get network information

  1. Log on to the ECS console where the CDH cluster is deployed.

  2. In the instance list, click the ECS instance where the CDH cluster is deployed.

  3. On the Instance Details page, record the Security Group, VPC, and Virtual Switch values.

You will use this information to attach the resource group to the same VPC in the next section.

Configure network connectivity

Select the section that matches your resource group type.

Serverless resource group

Attach a VPC

  1. Log on to the DataWorks console.

  2. In the left navigation pane, click Resource Group. The Exclusive Resource Groups tab is selected by default.

  3. Click Network Settings next to your Serverless resource group.

  4. On the VPC Binding tab, in the Data Scheduling & Data Integration section, click Add Binding.

  5. Select the VPC, zone, and vSwitch that match the values you recorded in Step 4.

Configure DNS resolution

The resource group uses hostname-based addressing to reach CDH cluster nodes. Set up DNS resolution in Alibaba Cloud DNS PrivateZone so that hostnames resolve to the private IP addresses of your ECS instances.

  1. Activate internal DNS resolution.

    Skip this step if internal DNS resolution is already active.
  2. Add a built-in authoritative domain name for each host address you recorded in Step 2. For example, add an authoritative zone for cdh-header-1-cn-shanghai and set the resolved IP to the private IP address of the corresponding ECS instance.

  3. Set the scope of the domain name.

    Select the VPC to which both the CDH cluster and the resource group are attached.

Exclusive resource group for scheduling

Attach a VPC

  1. Log on to the DataWorks console.

  2. In the left navigation pane, click Resource Group. The Exclusive Resource Groups tab is selected by default.

  3. Click Network Settings next to your exclusive resource group for scheduling.

  4. On the VPC Binding tab, click Add Binding.

  5. Select the VPC, zone, vSwitch, and security group that match the values you recorded in Step 4.

Configure hosts

  1. On the Host Configuration tab, click Batch Modify.

  2. Enter the host addresses you recorded in Step 2.

    Host configuration

What's next

After completing the preparation steps, register the CDH cluster in DataWorks and start data development. For instructions, see Data Development (Legacy): Attach a CDH computing resource.