You can enable LDAP authentication to enhance the security of a Spark Thrift Server. When LDAP authentication is enabled for a Spark Thrift Server, a client can connect to the Spark Thrift Server and execute SQL queries only after providing correct username and password credentials. This effectively prevents unauthorized access to sensitive data and features.
Limits
The engine version of Serverless Spark must meet the following requirements:
esr-4.x: esr-4.2.0 and later versions.
esr-3.x: esr-3.0.1 and later versions.
esr-2.x: esr-2.4.1 and later versions.
Prerequisites
A Spark Thrift Server session is created. For more information, see Manage Spark Thrift Server sessions.
Optional: If you need to use the OpenLDAP service of an Alibaba Cloud EMR on ECS cluster, you must create a cluster with the OpenLDAP service selected and add users. For more information, see Create a cluster and OpenLDAP user management.
Procedure
Step 1: Prepare the network
You must configure network connectivity between Serverless Spark and your virtual private cloud (VPC) to allow the specified Spark Thrift Server to connect to the LDAP server. For more information, see Network connectivity between EMR Serverless Spark and other VPCs.
Step 2: Configure startup parameters for the Spark Thrift Server
Before you enable LDAP authentication for a Spark Thrift Server, you must stop the Spark Thrift Server. Select the created connection name from the Network Connectivity drop-down list, and add the following configuration items in Spark Configuration. After you add the configuration items, you must restart the Spark Thrift Server to make the configurations take effect.
spark.hive.server2.authentication LDAP
spark.hive.server2.authentication.ldap.url ldap://<ldap_url>:<ldap_port>
spark.hive.server2.authentication.ldap.baseDN <ldap_base_dn>Configure the following parameters based on your business requirements:
<ldap_url>and<ldap_port>: the URL and port of the LDAP server. If you connect to the OpenLDAP service of an Alibaba Cloud EMR on ECS cluster, you can specify the internal IP address of the master node for<ldap_url>and 10389 for<ldap_port>.NoteIf LDAP is a high availability (HA) service, separate multiple LDAP connection addresses with spaces, such as
ldap://<ldap_url_1>:<ldap_port> ldap://<ldap_url_2>:<ldap_port>.<ldap_base_dn>: the base DN used for LDAP authentication. If you connect to the OpenLDAP service of an Alibaba Cloud EMR on ECS cluster, you can specifyou=people,o=emr.
Step 3: Connect to the Spark Thrift Server for which LDAP authentication is enabled
This section provides two methods for you to connect to the Spark Thrift Server for which LDAP authentication is enabled. When you connect to the Spark Thrift Server, replace the following information based on your business requirements:
<endpoint>: the Endpoint (Public) or Endpoint (Internal) information that you obtain on the Overview tab.If you use an internal same-region endpoint, the Spark Thrift Server can be accessed only by resources within the same VPC.
<token>: the token information on the Token Management tab.<port>: The port number. The port number is 443 when you access the server by using a public endpoint, and 80 when you access the server by using an internal same-region endpoint.<username>and<password>: the username and password used to log on to the LDAP server. If you connect to the OpenLDAP service of an Alibaba Cloud EMR on ECS cluster, specify the username and password that you added on the User Management page of EMR on ECS.
Method 1: Use the Beeline command line interface
beeline -u 'jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>' -n <username> -p <password>Method 2: Use a JDBC URL
You can use another application, such as a Java application, or build a complete Java Database Connectivity (JDBC) URL to connect to the Spark Thrift Server. The URL must be in the following format:
jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>;user=<username>;password=<password>