To develop and manage E-MapReduce (EMR) or CDH (Cloudera's Distribution Including Apache Hadoop, hereinafter referred to as CDH) jobs in DataWorks, you must attach the corresponding EMR or CDH cluster as a computing resource in DataWorks through cluster management. After attachment is complete, you can use this computing resource in DataWorks for data synchronization, development, and other operations.
Manage clusters
Attach an EMR cluster
Supported EMR cluster types: DataLake cluster (new data lake): EMR on ECS, Custom cluster: EMR on ECS, Hadoop cluster (old data lake): EMR on ECS, Spark cluster: EMR on ACK, and EMR Serverless Spark cluster.
ImportantYou can use EMR Hadoop clusters of the following versions in DataWorks:
EMR-3.38.2, EMR-3.38.3, EMR-4.9.0, EMR-5.6.0, EMR-3.26.3, EMR-3.27.2, EMR-3.29.0, EMR-3.32.0, EMR-3.35.0, EMR-4.3.0, EMR-4.4.1, EMR-4.5.0, EMR-4.5.1, EMR-4.6.0, EMR-4.8.0, EMR-5.2.1, EMR-5.4.3
Hadoop clusters (old data lake) are no longer recommended. We recommend that you migrate to DataLake clusters as soon as possible. For more information, see Migrate Hadoop clusters to DataLake clusters.
Configuration and attachment:
Old version of Data Development: Configure EMR computing resources in Old version of Data Development: Attach EMR computing resources.
. For more information, seeNew version of Data Development: Configure EMR computing resources in New version of Data Development: Attach EMR computing resources, Attach EMR Serverless Spark computing resources, Attach EMR Serverless StarRocks computing resources.
. For more information, see
Attach a CDH/CDP cluster
Supported cluster versions: DataWorks supports CDH5.16.2, CDH6.1.1, CDH6.2.1, CDH6.3.2, and CDP7.1.7 versions that you can directly select. The component versions that come with these cluster versions (the versions of each component in the cluster connection information) are fixed. If these cluster versions do not meet your business requirements, you can select Custom Version.
Configuration and attachment:
Old version of Data Development: Configure CDH computing resources in Old version of Data Development: Attach CDH computing resources.
. For more information, seeNew version of Data Development: Configure CDH computing resources in New version of Data Development: Attach CDH computing resources.
. For more information, see