The Data Integration service of DataWorks supports only Kerberos authentication. After you configure Kerberos authentication, authentication is performed only on trusted applications and services. This way, only the applications and services that pass the authentication can access data. This topic describes how Kerberos authentication works.

Background information

Kerberos is a computer network security protocol for authentication. It enables users to obtain service tickets that can be used to access multiple services by providing identity authentication information only once to achieve single sign-on (SSO). Kerberos provides high security. When you use Kerberos, a shared key is created between each client and service. Clients communicate with services by using keys. This way, untrusted services or applications cannot access data.

Limits

  • Only CDH V6.X clusters support Kerberos authentication. CDH clusters of other versions or self-managed clusters for which Kerberos authentication tests are not performed may fail the authentication.
  • Only Alibaba Cloud HBase and Hive data sources support Kerberos authentication. Self-managed data sources do not support Kerberos authentication.
  • Only the data sources that are connected to exclusive resource groups for Data Integration support Kerberos authentication.

How Kerberos authentication works

Kerberos is a third-party authentication protocol that is based on symmetric keys. Clients and services use Key Distribution Center (KDC) to perform identity authentication. KDC is a server program of Kerberos and can distribute Ticket Granting Tickets (TGT). For more information about Kerberos, see Introduction to Kerberos. Architecture

The preceding figure shows the four stages that are contained during the Kerberos authentication on DataWorks.

  1. A client requests a TGT: When a client (principal) accesses a data source for which Kerberos authentication is enabled, the client requests a TGT from Authentication Server (AS) in KDC. Then, the client uses the obtained TGT to request another TGT for a specific service from Ticket Granting Server (TGS) in KDC.
  2. KDC allocates a TGT: After KDC receives a request from the client, KDC authenticates the identity of the client. If the client passes the authentication, KDC allocates an encrypted TGT that has a specific validity period to the client.
  3. The client requests to access a specific service: After the client obtains the TGT, the client requests to access specific service resources from the service server based on the service name.
  4. The service server authenticates the identity of the client: After the service server receives the request from the client, the server authenticates the identity of the client. If the client passes the authentication, the client can access the service resources.

A keytab authentication file and a krb5.conf configuration file are required for Kerberos authentication. The krb5.conf configuration file is used to store the configurations of KDC servers. The keytab file is used to store the identity authentication tickets of resource principals, including principals and encrypted principal keys. Before you perform a Kerberos authentication, you must upload the keytab authentication file and keb5.conf configuration file on the Authentication File Management page of the DataWorks console, reference the uploaded files, and configure a principal when you add a data source. For more information about how to upload a keytab authentication file and the parameters of Kerberos authentication for different types of data sources, see Upload and reference an authentication file and Data sources that support Kerberos authentication.

Data sources that support Kerberos authentication

The following table lists the data source types that support Kerberos authentication and the configuration guide of Kerberos authentication for these types of data sources.
Data source type References
HBase Add an HBase data source
Hive Add a Hive data source