Manually integrate the Ranger Kafka plug-in into a Dataflow cluster when the plug-in is missing or its version is incompatible with the Ranger service in use.
This procedure applies only to EMR V3.45.0 and earlier minor versions. Kafka in EMR V5.X.X clusters does not support Ranger authorization.
Prerequisites
Before you begin, ensure that you have:
-
A Dataflow cluster. For more information, see Create a cluster.
-
SASL authentication enabled for the cluster. For more information, see Log on to a Kafka cluster by using SASL.
ImportantThe user for Kafka internal communications must have permissions on all resources. Kafka client users used as internal components must also have the corresponding permissions. Use the same user for both the internal component clients and the Kafka service.
-
An external Ranger service.
-
A Kafka management user created in Ranger with permissions on all Kafka resources. We recommend that you name this user kafka.
Install the Ranger Kafka plug-in
The following steps use the Ranger V2.1.0 code package as an example. Select the code package version that matches your Ranger service and Kafka service versions.
Use the Ranger Kafka plug-in version that matches your Ranger service version.
-
Log on to the cluster in SSH mode. For more information, see Log on to a cluster.
-
Download the Ranger plug-in:
wget https://dlcdn.apache.org/ranger/2.1.0/apache-ranger-2.1.0.tar.gzTo download a different version, visit the Ranger official website.
-
Build the installation package for the Ranger Kafka plug-in:
tar xvf apache-ranger-2.1.0.tar.gz cd apache-ranger-2.1.0 mvn clean compile package assembly:assembly install -DskipTests -Drat.skip=true cd target ls -lrt ranger-2.1.0-kafka-plugin.tar.gz -
Upload the installation package to the fixed installation directory
/opt/apps/ranger-pluginon all Kafka broker nodes. -
Prepare the
install.propertiesfile and place it in the/opt/apps/ranger-plugin/ranger-2.1.0-kafka-plugindirectory. Configure the following required parameters: The following is a complete sampleinstall.propertiesfile for EMR V3.43.1:Parameter Description Example value COMPONENT_INSTALL_DIR_NAMEKafka installation path. EMR Kafka is fixed at /opt/apps/KAFKA/kafka-current./opt/apps/KAFKA/kafka-currentPOLICY_MGR_URLURL of the Ranger policy manager. See the install.propertiesfile for the format.http://master-1-1.c-590b6062db9d****.cn-hangzhou.emr.aliyuncs.com:6080REPOSITORY_NAMEName of the Ranger policy repository to use. kafkadevXAAUDIT.SUMMARY.ENABLEEnables audit summarization. Default: true.trueXAAUDIT.SOLR.ENABLEEnables Solr auditing. Default: true.trueXAAUDIT.SOLR.URLSolr server address. http://master-1-1.c-590b6062db9d****.cn-hangzhou.emr.aliyuncs.com:6083/solr/ranger_auditsXAAUDIT.SOLR.USERUsername for accessing the Solr server. NONEXAAUDIT.SOLR.PASSWORDPassword for accessing the Solr server. NONEXAAUDIT.SOLR.ZOOKEEPERSolrCloud ZooKeeper address. NONEXAAUDIT.SOLR.FILE_SPOOL_DIRDirectory for storing audit logs. /var/log/taihao-apps/kafka/audit/spool# Location of component folder COMPONENT_INSTALL_DIR_NAME=/opt/apps/KAFKA/kafka-current # # Location of Policy Manager URL # # Example: # POLICY_MGR_URL=http://policymanager.xasecure.net:6080 # You can replace the parameter value based on your business scenario. POLICY_MGR_URL=http://master-1-1.c-590b6062db9d****.cn-hangzhou.emr.aliyuncs.com:6080 # # This is the repository name created within policy manager # # Example: # REPOSITORY_NAME=kafkadev # You can replace the parameter value based on your business scenario. REPOSITORY_NAME=kafkadev # AUDIT configuration with V3 properties #Should audit be summarized at source XAAUDIT.SUMMARY.ENABLE=true # Enable audit logs to Solr #Example #XAAUDIT.SOLR.ENABLE=true #XAAUDIT.SOLR.URL=http://localhost:6083/solr/ranger_audits #XAAUDIT.SOLR.ZOOKEEPER= #XAAUDIT.SOLR.FILE_SPOOL_DIR=/var/log/kafka/audit/solr/spool # You can replace the parameter value based on your business scenario. XAAUDIT.SOLR.ENABLE=true XAAUDIT.SOLR.URL=http://master-1-1.c-590b6062db9d****.cn-hangzhou.emr.aliyuncs.com:6083/solr/ranger_audits XAAUDIT.SOLR.USER=NONE XAAUDIT.SOLR.PASSWORD=NONE XAAUDIT.SOLR.ZOOKEEPER=NONE XAAUDIT.SOLR.FILE_SPOOL_DIR=/var/log/taihao-apps/kafka/audit/spool # Enable audit logs to Elasticsearch #Example #XAAUDIT.ELASTICSEARCH.ENABLE=true #XAAUDIT.ELASTICSEARCH.URL=localhost #XAAUDIT.ELASTICSEARCH.INDEX=audit XAAUDIT.ELASTICSEARCH.ENABLE=false XAAUDIT.ELASTICSEARCH.URL=NONE XAAUDIT.ELASTICSEARCH.USER=NONE XAAUDIT.ELASTICSEARCH.PASSWORD=NONE XAAUDIT.ELASTICSEARCH.INDEX=NONE XAAUDIT.ELASTICSEARCH.PORT=NONE XAAUDIT.ELASTICSEARCH.PROTOCOL=NONE # Enable audit logs to HDFS #Example #XAAUDIT.HDFS.ENABLE=true #XAAUDIT.HDFS.HDFS_DIR=hdfs://node-1.example.com:8020/ranger/audit # If using Azure Blob Storage #XAAUDIT.HDFS.HDFS_DIR=wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path> #XAAUDIT.HDFS.HDFS_DIR=wasb://ranger_audit_cont****@my-azure-account.blob.core.windows.net/ranger/audit #XAAUDIT.HDFS.FILE_SPOOL_DIR=/var/log/kafka/audit/hdfs/spool XAAUDIT.HDFS.ENABLE=false XAAUDIT.HDFS.HDFS_DIR=hdfs://__REPLACE__NAME_NODE_HOST:8020/ranger/audit XAAUDIT.HDFS.FILE_SPOOL_DIR=/var/log/kafka/audit/hdfs/spool # Following additional properties are needed when auditing to Azure Blob Storage via HDFS # Get these values from your /etc/hadoop/conf/core-site.xml #XAAUDIT.HDFS.HDFS_DIR=wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path> XAAUDIT.HDFS.AZURE_ACCOUNTNAME=__REPLACE_AZURE_ACCOUNT_NAME XAAUDIT.HDFS.AZURE_ACCOUNTKEY=__REPLACE_AZURE_ACCOUNT_KEY XAAUDIT.HDFS.AZURE_SHELL_KEY_PROVIDER=__REPLACE_AZURE_SHELL_KEY_PROVIDER XAAUDIT.HDFS.AZURE_ACCOUNTKEY_PROVIDER=__REPLACE_AZURE_ACCOUNT_KEY_PROVIDER #Log4j Audit Provider XAAUDIT.LOG4J.ENABLE=false XAAUDIT.LOG4J.IS_ASYNC=false XAAUDIT.LOG4J.ASYNC.MAX.QUEUE.SIZE=10240 XAAUDIT.LOG4J.ASYNC.MAX.FLUSH.INTERVAL.MS=30000 XAAUDIT.LOG4J.DESTINATION.LOG4J=true XAAUDIT.LOG4J.DESTINATION.LOG4J.LOGGER=xaaudit # End of V3 properties # # Audit to HDFS configuration # # If XAAUDIT.HDFS.IS_ENABLED is set to true, replace tokens # that start with __REPLACE__ with appropriate values # XAAUDIT.HDFS.IS_ENABLED=true # XAAUDIT.HDFS.DESTINATION_DIRECTORY=hdfs://__REPLACE__NAME_NODE_HOST:8020/ranger/audit/%app-type%/%time:yyyyMMdd% # XAAUDIT.HDFS.LOCAL_BUFFER_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit # XAAUDIT.HDFS.LOCAL_ARCHIVE_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit/archive # # Example: # XAAUDIT.HDFS.IS_ENABLED=true # XAAUDIT.HDFS.DESTINATION_DIRECTORY=hdfs://namenode.example.com:8020/ranger/audit/%app-type%/%time:yyyyMMdd% # XAAUDIT.HDFS.LOCAL_BUFFER_DIRECTORY=/var/log/kafka/audit # XAAUDIT.HDFS.LOCAL_ARCHIVE_DIRECTORY=/var/log/kafka/audit/archive # XAAUDIT.HDFS.IS_ENABLED=false XAAUDIT.HDFS.DESTINATION_DIRECTORY=hdfs://__REPLACE__NAME_NODE_HOST:8020/ranger/audit/%app-type%/%time:yyyyMMdd% XAAUDIT.HDFS.LOCAL_BUFFER_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit XAAUDIT.HDFS.LOCAL_ARCHIVE_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit/archive XAAUDIT.HDFS.DESTINTATION_FILE=%hostname%-audit.log XAAUDIT.HDFS.DESTINTATION_FLUSH_INTERVAL_SECONDS=900 XAAUDIT.HDFS.DESTINTATION_ROLLOVER_INTERVAL_SECONDS=86400 XAAUDIT.HDFS.DESTINTATION_OPEN_RETRY_INTERVAL_SECONDS=60 XAAUDIT.HDFS.LOCAL_BUFFER_FILE=%time:yyyyMMdd-HHmm.ss%.log XAAUDIT.HDFS.LOCAL_BUFFER_FLUSH_INTERVAL_SECONDS=60 XAAUDIT.HDFS.LOCAL_BUFFER_ROLLOVER_INTERVAL_SECONDS=600 XAAUDIT.HDFS.LOCAL_ARCHIVE_MAX_FILE_COUNT=10 #Solr Audit Provider XAAUDIT.SOLR.IS_ENABLED=false XAAUDIT.SOLR.MAX_QUEUE_SIZE=1 XAAUDIT.SOLR.MAX_FLUSH_INTERVAL_MS=1000 XAAUDIT.SOLR.SOLR_URL=http://localhost:6083/solr/ranger_audits # End of V2 properties # # SSL client certificate information # # Example: # SSL_KEYSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-keystore.jks # SSL_KEYSTORE_PASSWORD=none # SSL_TRUSTSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-truststore.jks # SSL_TRUSTSTORE_PASSWORD=none # # You do not need SSL between the agent and the security admin tool. # Leave these sample values as-is. # SSL_KEYSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-keystore.jks SSL_KEYSTORE_PASSWORD=myKeyFilePassword SSL_TRUSTSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-truststore.jks SSL_TRUSTSTORE_PASSWORD=changeit # # Custom component user # CUSTOM_COMPONENT_USER=<custom-user> # Leave blank to use the default component user. CUSTOM_USER=kafka # # Custom component group # CUSTOM_COMPONENT_GROUP=<custom-group> # Leave blank to use the default component group. CUSTOM_GROUP=hadoop -
Install the Ranger Kafka plug-in:
sudo su - root cd /opt/apps/ranger-plugin/ranger-2.1.0-kafka-plugin ./enable-kafka-plugin.sh ./install.properties -
Update the Kafka
server.propertiesconfiguration in the EMR console.-
Go to the Configure tab of the Kafka service page in the EMR console.
-
Click the server.properties tab.
-
Update the following parameters:
-
kafka_server_start_cmd_addition_args: Append
CLASSPATH=$CLASSPATH:/opt/apps/KAFKA/kafka-current/configto the parameter value.NoteIf
kafka_server_start_cmd_addition_argsis not available in the EMR console, create a symbolic link on each Kafka broker node instead:cd /opt/apps/KAFKA/kafka-current/libs sudo ln -s /opt/apps/KAFKA/kafka-current/config kafka-conf -
authorizer.class.name: Set to
org.apache.ranger.authorization.kafka.authorizer.RangerKafkaAuthorizer.
-
-
Save the configuration:
-
Click Save.
-
In the dialog box that appears, set the Execution Reason parameter and click Save.
-
-
-
Restart the Kafka broker.
-
Go to the Status tab of the Kafka service page in the EMR console, find the KafkaBroker component, and click Restart in the Actions column.
-
Set the Execution Reason parameter and click OK.
-
In the Confirm dialog box, click OK.
-
What's next
-
To install multiple Ranger Kafka plug-ins across nodes at once, use bootstrap actions. For more information, see Manually run scripts.
-
For EMR V3.45.0 or later minor versions, enable Kafka in Ranger directly from the EMR console. For more information, see Enable Kafka in Ranger and configure related permissions.