If the Ranger Kafka plug-in is not installed in the Dataflow cluster of E-MapReduce (EMR), or the version of the Ranger Kafka plug-in is not compatible with the Ranger service that is being used, you need to manually integrate the Ranger Kafka plug-in. This topic describes how to manually install the Ranger Kafka plug-in and complete valid configurations for the Ranger Kafka plug-in.
Prerequisites
A Dataflow cluster is created. For more information, see Create a cluster.
The SASL logon authentication feature is enabled for the cluster. For more information about how to enable the SASL logon authentication feature, see Log on to a Kafka cluster by using SASL.
ImportantWhen you configure settings to enable SASL logon authentication, make sure that the user used for Kafka internal communications has the permissions of all resources, and that the users of the Kafka clients used as internal components have the corresponding permissions. We recommend that you use the same user for both the internal component clients and the Kafka service.
An external Ranger service is created.
A Kafka management user is created in Ranger and has permissions on all Kafka resources.
NoteWe recommend that you set the name of the management user to kafka.
Limits
This topic is applicable to only clusters of EMR V3.45.0 or an earlier minor version.
This topic is not applicable to EMR V5.X.X clusters because Kafka that is deployed in an EMR V5.X.X cluster does not support Ranger authentication.
Procedure
In this example, the code package of Ranger V2.1.0 is used. In actual use, you can select the corresponding Ranger code package version based on the Ranger service and Kafka service that you use.
We recommend that you use the Ranger Kafka plug-in the same version as the Ranger service.
Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
Run the following command to download the Ranger plug-in:
wget https://dlcdn.apache.org/ranger/2.1.0/apache-ranger-2.1.0.tar.gz
You can download the Ranger code package of the required version from the Ranger official website.
Run the following command to build an installation package for the Ranger Kafka plug-in:
tar xvf apache-ranger-2.1.0.tar.gz cd apache-ranger-2.1.0 mvn clean compile package assembly:assembly install -DskipTests -Drat.skip=true cd target ls -lrt ranger-2.1.0-kafka-plugin.tar.gz
Upload the installation package of the Ranger Kafka plug-in to the fixed installation directory of all Kafka broker nodes. In this example, the installation directory
/opt/apps/ranger-plugin
is used.Prepare the script file named install.properties and place the file in the
/opt/apps/ranger-plugin/ranger-2.1.0-kafka-plugin
directory.Configure the install.properties file of the Ranger plug-in based on your business requirements in different scenarios.
Configuration parameters
Parameter
Description
Required
COMPONENT_INSTALL_DIR_NAME
The installation path of Kafka.
EMR Kafka is installed in the /opt/apps/KAFKA/kafka-current directory. The installation directory is fixed.
Yes
POLICY_MGR_URL
The URL of the Ranger policy library.
Set this parameter based on the address of the Ranger service. For information about the value format, see the install.properties configuration file.
Yes
REPOSITORY_NAME
The name of the policy library used.
Set this parameter based on the address of the Ranger service. For information about the value format, see the install.properties configuration file.
Yes
XAAUDIT.SUMMARY.ENABLE
Specifies whether to enable auditing. Default value: true.
Yes
XAAUDIT.SOLR.ENABLE
Specifies whether to enable Solr auditing. Default value: true.
Yes
XAAUDIT.SOLR.URL
The address of the Solr server.
Set this parameter based on your business requirements. For more information about the value format, see the install.properties configuration file.
Yes
XAAUDIT.SOLR.USER
The user used to access the Solr server.
Yes
XAAUDIT.SOLR.PASSWORD
The password used to access the Solr server.
Yes
XAAUDIT.SOLR.ZOOKEEPER
The address to access SolrCloud ZooKeeper.
Yes
XAAUDIT.SOLR.FILE_SPOOL_DIR
The directory in which audit logs are stored.
Set this parameter based on your business requirements. For more information about the value format, see the install.properties configuration file.
Yes
Sample code
In the following sample code, EMR V3.43.1 is used.
# Location of component folder COMPONENT_INSTALL_DIR_NAME=/opt/apps/KAFKA/kafka-current # # Location of Policy Manager URL # # Example: # POLICY_MGR_URL=http://policymanager.xasecure.net:6080 # You can replace the parameter value based on your business scenario. POLICY_MGR_URL=http://master-1-1.c-590b6062db9d****.cn-hangzhou.emr.aliyuncs.com:6080 # # This is the repository name created within policy manager # # Example: # REPOSITORY_NAME=kafkadev # You can replace the parameter value based on your business scenario. REPOSITORY_NAME=kafkadev # AUDIT configuration with V3 properties #Should audit be summarized at source XAAUDIT.SUMMARY.ENABLE=true # Enable audit logs to Solr #Example #XAAUDIT.SOLR.ENABLE=true #XAAUDIT.SOLR.URL=http://localhost:6083/solr/ranger_audits #XAAUDIT.SOLR.ZOOKEEPER= #XAAUDIT.SOLR.FILE_SPOOL_DIR=/var/log/kafka/audit/solr/spool # You can replace the parameter value based on your business scenario. XAAUDIT.SOLR.ENABLE=true XAAUDIT.SOLR.URL=http://master-1-1.c-590b6062db9d****.cn-hangzhou.emr.aliyuncs.com:6083/solr/ranger_audits XAAUDIT.SOLR.USER=NONE XAAUDIT.SOLR.PASSWORD=NONE XAAUDIT.SOLR.ZOOKEEPER=NONE XAAUDIT.SOLR.FILE_SPOOL_DIR=/var/log/taihao-apps/kafka/audit/spool # Enable audit logs to ElasticSearch #Example #XAAUDIT.ELASTICSEARCH.ENABLE=true #XAAUDIT.ELASTICSEARCH.URL=localhost #XAAUDIT.ELASTICSEARCH.INDEX=audit XAAUDIT.ELASTICSEARCH.ENABLE=false XAAUDIT.ELASTICSEARCH.URL=NONE XAAUDIT.ELASTICSEARCH.USER=NONE XAAUDIT.ELASTICSEARCH.PASSWORD=NONE XAAUDIT.ELASTICSEARCH.INDEX=NONE XAAUDIT.ELASTICSEARCH.PORT=NONE XAAUDIT.ELASTICSEARCH.PROTOCOL=NONE # Enable audit logs to HDFS #Example #XAAUDIT.HDFS.ENABLE=true #XAAUDIT.HDFS.HDFS_DIR=hdfs://node-1.example.com:8020/ranger/audit # If using Azure Blob Storage #XAAUDIT.HDFS.HDFS_DIR=wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path> #XAAUDIT.HDFS.HDFS_DIR=wasb://ranger_audit_cont****@my-azure-account.blob.core.windows.net/ranger/audit #XAAUDIT.HDFS.FILE_SPOOL_DIR=/var/log/kafka/audit/hdfs/spool XAAUDIT.HDFS.ENABLE=false XAAUDIT.HDFS.HDFS_DIR=hdfs://__REPLACE__NAME_NODE_HOST:8020/ranger/audit XAAUDIT.HDFS.FILE_SPOOL_DIR=/var/log/kafka/audit/hdfs/spool # Following additional propertis are needed When auditing to Azure Blob Storage via HDFS # Get these values from your /etc/hadoop/conf/core-site.xml #XAAUDIT.HDFS.HDFS_DIR=wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path> XAAUDIT.HDFS.AZURE_ACCOUNTNAME=__REPLACE_AZURE_ACCOUNT_NAME XAAUDIT.HDFS.AZURE_ACCOUNTKEY=__REPLACE_AZURE_ACCOUNT_KEY XAAUDIT.HDFS.AZURE_SHELL_KEY_PROVIDER=__REPLACE_AZURE_SHELL_KEY_PROVIDER XAAUDIT.HDFS.AZURE_ACCOUNTKEY_PROVIDER=__REPLACE_AZURE_ACCOUNT_KEY_PROVIDER #Log4j Audit Provider XAAUDIT.LOG4J.ENABLE=false XAAUDIT.LOG4J.IS_ASYNC=false XAAUDIT.LOG4J.ASYNC.MAX.QUEUE.SIZE=10240 XAAUDIT.LOG4J.ASYNC.MAX.FLUSH.INTERVAL.MS=30000 XAAUDIT.LOG4J.DESTINATION.LOG4J=true XAAUDIT.LOG4J.DESTINATION.LOG4J.LOGGER=xaaudit # End of V3 properties # # Audit to HDFS Configuration # # If XAAUDIT.HDFS.IS_ENABLED is set to true, please replace tokens # that start with __REPLACE__ with appropriate values # XAAUDIT.HDFS.IS_ENABLED=true # XAAUDIT.HDFS.DESTINATION_DIRECTORY=hdfs://__REPLACE__NAME_NODE_HOST:8020/ranger/audit/%app-type%/%time:yyyyMMdd% # XAAUDIT.HDFS.LOCAL_BUFFER_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit # XAAUDIT.HDFS.LOCAL_ARCHIVE_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit/archive # # Example: # XAAUDIT.HDFS.IS_ENABLED=true # XAAUDIT.HDFS.DESTINATION_DIRECTORY=hdfs://namenode.example.com:8020/ranger/audit/%app-type%/%time:yyyyMMdd% # XAAUDIT.HDFS.LOCAL_BUFFER_DIRECTORY=/var/log/kafka/audit # XAAUDIT.HDFS.LOCAL_ARCHIVE_DIRECTORY=/var/log/kafka/audit/archive # XAAUDIT.HDFS.IS_ENABLED=false XAAUDIT.HDFS.DESTINATION_DIRECTORY=hdfs://__REPLACE__NAME_NODE_HOST:8020/ranger/audit/%app-type%/%time:yyyyMMdd% XAAUDIT.HDFS.LOCAL_BUFFER_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit XAAUDIT.HDFS.LOCAL_ARCHIVE_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit/archive XAAUDIT.HDFS.DESTINTATION_FILE=%hostname%-audit.log XAAUDIT.HDFS.DESTINTATION_FLUSH_INTERVAL_SECONDS=900 XAAUDIT.HDFS.DESTINTATION_ROLLOVER_INTERVAL_SECONDS=86400 XAAUDIT.HDFS.DESTINTATION_OPEN_RETRY_INTERVAL_SECONDS=60 XAAUDIT.HDFS.LOCAL_BUFFER_FILE=%time:yyyyMMdd-HHmm.ss%.log XAAUDIT.HDFS.LOCAL_BUFFER_FLUSH_INTERVAL_SECONDS=60 XAAUDIT.HDFS.LOCAL_BUFFER_ROLLOVER_INTERVAL_SECONDS=600 XAAUDIT.HDFS.LOCAL_ARCHIVE_MAX_FILE_COUNT=10 #Solr Audit Provider XAAUDIT.SOLR.IS_ENABLED=false XAAUDIT.SOLR.MAX_QUEUE_SIZE=1 XAAUDIT.SOLR.MAX_FLUSH_INTERVAL_MS=1000 XAAUDIT.SOLR.SOLR_URL=http://localhost:6083/solr/ranger_audits # End of V2 properties # # SSL Client Certificate Information # # Example: # SSL_KEYSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-keystore.jks # SSL_KEYSTORE_PASSWORD=none # SSL_TRUSTSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-truststore.jks # SSL_TRUSTSTORE_PASSWORD=none # # You do not need use SSL between agent and security admin tool, please leave these sample value as it is. # SSL_KEYSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-keystore.jks SSL_KEYSTORE_PASSWORD=myKeyFilePassword SSL_TRUSTSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-truststore.jks SSL_TRUSTSTORE_PASSWORD=changeit # # Custom component user # CUSTOM_COMPONENT_USER=<custom-user> # keep blank if component user is default CUSTOM_USER=kafka # # Custom component group # CUSTOM_COMPONENT_GROUP=<custom-group> # keep blank if component group is default CUSTOM_GROUP=hadoop
Run the following command to install the Ranger Kafka plug-in:
sudo su - root cd /opt/apps/ranger-plugin/ranger-2.1.0-kafka-plugin ./enable-kafka-plugin.sh ./install.properties
Modify the server.properties configuration file of Kafka in the EMR console.
Go to the Configure tab of the Kafka service page in the EMR console.
Click the server.properties tab.
Modify the values of the following parameters:
kafka_server_start_cmd_addition_args: Append CLASSPATH=$CLASSPATH:/opt/apps/KAFKA/kafka-current/config to the parameter value.
NoteIf you cannot configure the kafka_server_start_cmd_addition_args parameter on the Kafka service page in the EMR console, you cannot modify the CLASSPATH variable value by using this method. In this case, you can run the following command on the Kafka broker instance to add the configuration file of the Kafka plug-in:
cd /opt/apps/KAFKA/kafka-current/libs sudo ln -s /opt/apps/KAFKA/kafka-current/config kafka-conf
authorizer.class.name: Change the parameter value to org.apache.ranger.authorization.kafka.authorizer.RangerKafkaAuthorizer.
Save the configurations.
Click Save.
In the dialog box that appears, configure the Execution Reason parameter and click Save.
Restart the Kafka broker in the EMR console.
Go to the Status tab of the Kafka service page in the EMR console, find the KafkaBroker component, and then click Restart in the Actions column.
In the dialog box that appears, configure the Execution Reason parameter and click OK.
In the Confirm message, click OK.
References
In addition to manually installing the Ranger Kafka plug-in, you can also add bootstrap actions to install and deploy multiple Ranger Kafka plug-ins based on your business requirements. For more information, see Manually run scripts.
If your cluster is of EMR V3.45.0 or a later minor version, you can enable Kafka in Ranger in the EMR console. For more information, see Enable Kafka in Ranger and configure related permissions.