All Products
Search
Document Center

Dataphin:Create a Greenplum data source

Last Updated:May 28, 2025

By creating a Greenplum data source, you can enable Dataphin to read business data from Greenplum or write data to Greenplum. This topic describes how to create a Greenplum data source.

Background information

Greenplum is a big data analytics engine suitable for analytics, machine learning, and AI fields. Its architecture is primarily designed for managing large-scale analytical data warehouses and business intelligence workloads.

Policy document

Only custom global roles with the Create Data Source permission and system roles such as super administrator, data source administrator, domain architect, and project administrator can create data sources.

Procedure

  1. On the Dataphin homepage, click Management Center > Datasource Management in the top navigation bar.

  2. On the Datasource page, click +Create Data Source.

  3. In the Create Data Source page, select Greenplum in the Big Data section.

    If you have recently used Greenplum, you can also select Greenplum in the Recently Used section. You can also enter keywords for Greenplum in the search box to quickly find it.

  4. On the Create Greenplum Datasource page, configure the connection parameters.

    1. Configure the basic information of the data source.

      Parameter

      Description

      Datasource Name

      Enter a name for the data source. The name must meet the following requirements:

      • It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).

      • It cannot exceed 64 characters in length.

      Datasource Code

      After you configure the data source code, you can reference tables in the data source in Flink_SQL tasks by using the format datasource_code.table_name or datasource_code.schema.table_name. If you need to automatically access the data source in the corresponding environment based on the current environment, use the variable format ${datasource_code}.table or ${datasource_code}.schema.table. For more information, see Development method for Dataphin data source tables.

      Important
      • The data source code cannot be modified after it is configured successfully.

      • After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.

      • In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.

      Data Source Description

      Enter a brief description of the data source. The description cannot exceed 128 characters.

      Data Source Configuration

      Select the data source to configure:

      • If your business data source distinguishes between production and development data sources, select Production + Development Data Source.

      • If your business data source does not distinguish between production and development data sources, select Production Data Source.

      Tag

      You can categorize data sources by adding tags. For information about how to create tags, see Manage data source tags.

    2. Configure the connection parameters between the data source and Dataphin.

      If you select Production + Development Data Source for Data Source Configuration, you need to configure the connection information for both production and development data sources. If you select Production + Development Data Sources Production Data Source, you only need to configure the connection information for the production data source. Production data source

      Note

      Typically, production and development data sources should be configured as separate data sources to achieve environment isolation and reduce the impact of development activities on production. However, Dataphin also supports configuring them as the same data source with identical parameter values.

      Parameter

      Description

      JDBC URL

      Enter the JDBC connection address of the target database. The connection address format is jdbc:postgresql://host:port/dbname.

      Username, Password

      Enter the username and password for logging on to Greenplum.

    3. Configure advanced settings for the data source.

      Parameter

      Description

      connectTimeout

      The connectTimeout duration of the database in seconds. The default value is 900 seconds (15 minutes).

      Note
      • If you include a connectTimeout configuration in the JDBC URL, the connectTimeout value in the JDBC URL takes precedence.

      • For data sources created before Dataphin V3.11, the default connectTimeout value is -1, which indicates no timeout limit.

      socketTimeout

      The socketTimeout duration of the database in seconds. The default value is 1800 seconds (30 minutes).

      Note
      • If you include a socketTimeout configuration in the JDBC URL, the socketTimeout value in the JDBC URL takes precedence.

      • For data sources created before Dataphin V3.11, the default socketTimeout value is -1, which indicates no timeout limit.

      Connection Retries

      If the database connection times out, the system automatically retries the connection until the specified number of retries is reached. If the connection still fails after the maximum number of retries, the connection fails.

      Note
      • The default number of retries is 1. You can set a value between 0 and 10.

      • The connection retry count is applied by default to offline integration tasks and global quality (requires the Asset Quality feature to be enabled). You can configure task-level retry counts separately in offline integration tasks.

  5. Select a Default Resource Group that will be used to run tasks related to the current data source, including database SQL, offline database migration, data preview, and more.

  6. Click Test Connection or directly click OK to save and complete the creation of the Greenplum data source.

    When you click Test Connection, the system tests whether the data source can connect to Dataphin properly. If you directly click OK, the system automatically tests the connection for all selected clusters. However, the data source can still be created even if all selected clusters fail the connection test.

    Test Connection tests the connection for the Default Cluster or Registered Scheduling Clusters that have been registered in Dataphin and are in normal use. The Default Cluster is selected by default and cannot be deselected. If there are no resource groups under a Registered Scheduling Cluster, connection testing is not supported. You need to create a resource group first before testing the connection.

    • The selected clusters are only used to test network connectivity with the current data source and are not used for running related tasks later.

    • The test connection usually takes less than 2 minutes. If it times out, you can click the image icon to view the specific reason and retry.

    • Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the generation time of the final result.

      Note

      Only the test results for the Default Cluster include three connection statuses: Succeeded With Warning, Connection Successful, and Connection Failed. The test results for Registered Scheduling Clusters in Dataphin only include two connection statuses: Connection Successful and Connection Failed.

    • When the test result is Connection Failed, you can click the image icon to view the specific failure reason.

    • When the test result is Succeeded With Warning, it means that the application cluster connection is successful but the scheduling cluster connection failed. The current data source cannot be used for data development and integration. You can click the image icon to view the log information.