Kafka is a distributed messaging service that is widely used in big data fields such as log collection, monitoring data aggregation, streaming data processing, and online and offline analytics. You can configure synchronization nodes to read data from or write data to Kafka data sources. This topic describes how to add a Kafka data source.

Background information

Workspaces in standard mode support the data source isolation feature. You can add data sources separately for the development and production environments to isolate the data sources. This helps keep your data secure. For more information, see Isolate connections between the development and production environments.

Add a Kafka data source

  1. Go to the Data Source page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. After you select the region where the required workspace resides, find the workspace and click Data Integration in the Actions column.
    4. In the left-side navigation pane of the Data Integration page, choose Data Source > Data Sources to go to the Data Source page.
  2. On the Data Source page, click Add data source in the upper-right corner.
  3. In the Message Queue section of the Add data source dialog box, click Kafka.
  4. In the Add Kafka data source dialog box, configure the parameters.
    1. Configure basic information for the Kafka data source.
      You can use one of the following modes to add a Kafka data source: Alibaba Cloud instance mode and Connection string mode.
      • The following table describes the parameter configurations of a Kafka data source in Alibaba Cloud instance mode. Add a Kafka data source by using the Alibaba Cloud instance mode
        Parameter Description
        Data source type The type of the data source. Set this parameter to Alibaba Cloud instance mode.
        Data Source Name

        The name of the data source. The name can contain letters, digits, and underscores (_) and must start with a letter.

        Data source description

        The description of the data source. The description can be a maximum of 80 characters in length.

        Environment
        The environment in which the data source is used. Valid values: Development and Production.
        Note This parameter is displayed only when the workspace is in standard mode.
        Region The region where the data source resides.
        Instance ID The ID of the Kafka instance. You can log on to the Message Queue for Apache Kafka console and go to the instance details page to obtain the ID.
      • The following table describes the parameter configurations of a Kafka data source in connection string mode. Add a Kafka data source by using the connection string mode
        Parameter Description
        Data source type The type of the data source. Set this parameter to Connection string mode.
        Data Source Name

        The name of the data source. The name can contain letters, digits, and underscores (_) and must start with a letter.

        Data source description

        The description of the data source. The description can be a maximum of 80 characters in length.

        Environment
        The environment in which the data source is used. Valid values: Development and Production.
        Note This parameter is displayed only when the workspace is in standard mode.
        Kafka cluster address The address of a Kafka instance. The address consists of the IP address and port number of a broker. To obtain the address of the instance, you can log on to the Message Queue for Apache Kafka console, find the instance on the Instances page, and then click the instance name to go to the instance details page.

        Separate multiple addresses with commas (,), such as 10.0.0.1:9092,10.0.0.2:9092.

    2. Configure authentication information for the Kafka data source.
      Third-party authentication mechanisms are used to perform strict identity authentication on users and services. These mechanisms prevent untrusted applications or services from accessing data and improve the stability of data access during data synchronization. DataWorks provides third-party authentication mechanisms to ensure the data security of Kafka data sources. When you add a Kafka data source, you can set the Authentication parameter to one of the following mechanisms: SASL_PLAINTEXT, SASL_SSL, and SSL. This way, only trusted applications and services can access data in the Kafka data source.
      Note
      • Before you use a third-party authentication mechanism to perform identity authentication, you must upload the required authentication files on the Authentication File Management page of the DataWorks console. For more information, see Upload and reference an authentication file.
      • If you do not need to perform an identity authentication on applications or services, you can set the Authentication parameter to None when you add the Kafka data source.
      • You can configure third-party authentication for Kafka data sources that reside only in the China (Chengdu) region.
      • The parameter configurations of third-party authentication for a Kafka data source in Alibaba Cloud instance mode are the same as those of third-party authentication for a Kafka data source in connection string mode.
      The following descriptions provide the configurations of the preceding authentication mechanisms:
      • SASL_PLAINTEXT is a simple authentication mechanism that is implemented based on a username and a password. The following table describes the parameter configurations of the SASL_PLAINTEXT mechanism. SASL_PLAINTEXT
        Parameter Description
        Sasl Mechanism

        GSSAPI(Kerberos) and PLAIN are supported. Both authentication methods use the Simple Authentication and Security Layer (SASL) framework. PLAIN is a simple authentication method that is implemented based on a username and a password.

        Keytab File
        The keytab file that is used to store the key information of applications and services. The Jaas Config File parameter references the keytab file specified in the Keytab File parameter. You can directly reference the keytab file that is uploaded on the Authentication File Management page. You can also click Add Authentication File to upload a new keytab file.
        Note This parameter is required only when the Sasl Mechanism parameter is set to GSSAPI(Kerberos).
        Kerberos Config File
        Specifies the configuration file that is used to store the address information of the key distribution center (KDC). This parameter is used to specify the system property java.security.krb5.conf for secure authentication. You can directly reference the keytab file that is uploaded on the Authentication File Management page. You can also click Add Authentication File to upload a new keytab file.
        Note This parameter is required only when the Sasl Mechanism parameter is set to GSSAPI(Kerberos).
        Jaas Config File
        Specifies the configuration file that is used to store authentication and authorization information. This parameter is used to specify the system property java.security.auth.login.config for secure authentication. You can directly reference the keytab file that is uploaded on the Authentication File Management page. You can also click Add Authentication File to upload a new keytab file.
        Note If you set the Sasl Mechanism parameter to GSSAPI(Kerberos), the Jaas Config File parameter references the keytab file specified in the Keytab File parameter for authentication.
      • SASL_SSL is a simple mechanism that is used to perform authentication between clients and servers. The following table describes the parameter configurations of the SASL_SSL mechanism. SASL_SSL
        Parameter Description
        Sasl Mechanism

        GSSAPI(Kerberos) and PLAIN are supported. Both authentication methods use the Simple Authentication and Security Layer (SASL) framework. PLAIN is a simple authentication method that is implemented based on a username and a password.

        Truststore File
        Specifies the Truststore file that is used to store the digital certificates provided by Certificate Authority (CA) for the Kafka cluster. These certificates are verified when an application or service accesses the Secure Sockets Layer (SSL) server to ensure that the application or service is trusted. You can directly reference the keytab file that is uploaded on the Authentication File Management page. You can also click Add Authentication File to upload a new keytab file.
        Note CA digital certificates are used to check whether access sources are trusted.
        Truststore Password

        The password that is used to obtain the content of the CA digital certificates of the Kafka cluster.

        Keystore File

        The Keystore file that is used to store the trusted CA digital certificates and key information of the Kafka cluster. You can directly reference the keytab file that is uploaded on the Authentication File Management page. You can also click Add Authentication File to upload a new keytab file.

        Keystore Password

        The password that is used to access the Keystore file.

        Key Password

        The password that is used to obtain the specified key pair in the Keystore file.

        Keytab File
        The keytab file that is used to store the key information of applications and services. The Jaas Config File parameter references the keytab file specified in the Keytab File parameter. You can directly reference the keytab file that is uploaded on the Authentication File Management page. You can also click Add Authentication File to upload a new keytab file.
        Note This parameter is required only when the Sasl Mechanism parameter is set to GSSAPI(Kerberos).
        Kerberos Config File
        Specifies the configuration file that is used to store the address information of the key distribution center (KDC). This parameter is used to specify the system property java.security.krb5.conf for secure authentication. You can directly reference the keytab file that is uploaded on the Authentication File Management page. You can also click Add Authentication File to upload a new keytab file.
        Note This parameter is required only when the Sasl Mechanism parameter is set to GSSAPI(Kerberos).
        Jaas Config File
        Specifies the configuration file that is used to store authentication and authorization information. This parameter is used to specify the system property java.security.auth.login.config for secure authentication. You can directly reference the keytab file that is uploaded on the Authentication File Management page. You can also click Add Authentication File to upload a new keytab file.
        Note If you set the Sasl Mechanism parameter to GSSAPI(Kerberos), the Jaas Config File parameter references the keytab file specified in the Keytab File parameter for authentication.
      • SSL is a mechanism that is used to perform authentication between clients and servers. The following table describes the parameter configurations of the SSL mechanism. SSL
        Parameter Description
        Truststore File
        Specifies the Truststore file that is used to store the digital certificates provided by Certificate Authority (CA) for the Kafka cluster. These certificates are verified when an application or service accesses the Secure Sockets Layer (SSL) server to ensure that the application or service is trusted. You can directly reference the keytab file that is uploaded on the Authentication File Management page. You can also click Add Authentication File to upload a new keytab file.
        Note CA digital certificates are used to check whether access sources are trusted.
        Truststore Password

        The password that is used to obtain the content of the CA digital certificates of the Kafka cluster.

        Keystore File

        The Keystore file that is used to store the trusted CA digital certificates and key information of the Kafka cluster. You can directly reference the keytab file that is uploaded on the Authentication File Management page. You can also click Add Authentication File to upload a new keytab file.

        Keystore Password

        The password that is used to access the Keystore file.

        Key Password

        The password that is used to obtain the specified key pair in the Keystore file.

  5. Optional:Configure extended parameters for the Kafka data source.
    You can configure extended parameters for the Kafka data source based on your business requirements. The extended parameters are parameters that are related to the producer and consumer of Kafka. Specify the parameters in the JSON format. Extended parameters
    You can configure the following parameters:
    • batch.size: specifies the buffer size of messages that are sent to each partition. The buffer size indicates the total bytes of messages. In this example, this parameter is set to 16342.
    • linger.ms: specifies the maximum storage duration of each message in the buffer. Unit: milliseconds. In this example, this parameter is set to 10.
    {
    "batch.size":"16342",
    "linger.ms":"10"
    }
    Note When you configure a batch synchronization node by using the code editor or a real-time single-table synchronization node, if you set producer- or consumer-related parameters to values that are different from those of the parameters you configured in the extended parameters, the values of the parameters that you configured for the synchronization node take effect.
  6. Test the network connectivity of the Kafka data source.
    1. Select Data Integration for Resource Group connectivity.
    2. In the resource group list, find the resource group that you want to use and click Test connectivity in the Actions column to test the network connectivity between the Kafka data source and resource group.
      A synchronization node can use only one type of resource group. To ensure that your synchronization nodes can be normally run, you must test the connectivity of all the resource groups for Data Integration on which your synchronization nodes will be run. If you want to test the connectivity of multiple resource groups for Data Integration at a time, select the resource groups and click Batch test connectivity. For more information, see Select a network connectivity solution.
      Note
      • By default, the resource group list displays only exclusive resource groups for Data Integration. To ensure the stability and performance of data synchronization, we recommend that you use exclusive resource groups for Data Integration.
      • If you want to test the network connectivity between the shared resource group or a custom resource group and the data source, click Advanced below the resource group list. In the Warning message, click Confirm. Then, all available shared and custom resource groups appear in the resource group list.
  7. After the data source passes the connectivity test, click Complete.

What to do next

You can use the added Kafka data source in your data synchronization node. For more information, see Overview.