This topic describes how to add an HDFS data source in DataWorks.

Procedure

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces. Find the workspace to which you want to add a data source and click Data Integration in the Actions column.
    Go to the Data Integration console
  3. In the left-side navigation pane of the Data Integration console, choose Data Source > Data Sources.
  4. In the upper-right corner of the page, click Add data source.
  5. In the Add data source dialog box, click HDFS.
  6. In the Add HDFS data source dialog box, configure the parameters described in the following table.
    Parameter Description
    Data source type Select Connection string mode or Built-in Mode of CDH Cluster based on your requirements.
    Data Source Name The name of the data source. The name can contain letters, digits, and underscores (_). It cannot start with a digit or underscore (_).
    Data source description The description of the data source. The description can be up to 80 characters in length.
    DefaultFS The URL of the NameNode of the HDFS, in the hdfs://ServerIP:Port format.
  7. Click Test connectivity in the Actions column corresponding to the data source.
  8. After the data source passes the connectivity test, click Complete.

    Notes on connectivity testing

    • If the data source that you want to add is a self-managed data source hosted on an Elastic Compute Service (ECS) instance in the classic network, network connectivity cannot be ensured when the default resource group is used. We recommend that you use Data Integration to use a custom resource group.
    • Connectivity testing is not supported for data sources in virtual private clouds (VPCs). You can click Complete without testing the connectivity.