All Products
Search
Document Center

E-MapReduce:Manually integrate the Ranger Kafka plug-in

Last Updated:Feb 01, 2024

If the Ranger Kafka plug-in is not installed in the Dataflow cluster of E-MapReduce (EMR), or the version of the Ranger Kafka plug-in is not compatible with the Ranger service that is being used, you need to manually integrate the Ranger Kafka plug-in. This topic describes how to manually install the Ranger Kafka plug-in and complete valid configurations for the Ranger Kafka plug-in.

Prerequisites

  • A Dataflow cluster is created. For more information, see Create a cluster.

  • The SASL logon authentication feature is enabled for the cluster. For more information about how to enable the SASL logon authentication feature, see Log on to a Kafka cluster by using SASL.

    Important

    When you configure settings to enable SASL logon authentication, make sure that the user used for Kafka internal communications has the permissions of all resources, and that the users of the Kafka clients used as internal components have the corresponding permissions. We recommend that you use the same user for both the internal component clients and the Kafka service.

  • An external Ranger service is created.

  • A Kafka management user is created in Ranger and has permissions on all Kafka resources.

    Note

    We recommend that you set the name of the management user to kafka.

Limits

This topic is applicable to only clusters of EMR V3.45.0 or an earlier minor version.

Note

This topic is not applicable to EMR V5.X.X clusters because Kafka that is deployed in an EMR V5.X.X cluster does not support Ranger authentication.

Procedure

In this example, the code package of Ranger V2.1.0 is used. In actual use, you can select the corresponding Ranger code package version based on the Ranger service and Kafka service that you use.

Note

We recommend that you use the Ranger Kafka plug-in the same version as the Ranger service.

  1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.

  2. Run the following command to download the Ranger plug-in:

    wget https://dlcdn.apache.org/ranger/2.1.0/apache-ranger-2.1.0.tar.gz

    You can download the Ranger code package of the required version from the Ranger official website.

  3. Run the following command to build an installation package for the Ranger Kafka plug-in:

    tar xvf apache-ranger-2.1.0.tar.gz
    cd apache-ranger-2.1.0
    mvn clean compile package assembly:assembly install -DskipTests -Drat.skip=true
    cd target
    ls -lrt ranger-2.1.0-kafka-plugin.tar.gz
  4. Upload the installation package of the Ranger Kafka plug-in to the fixed installation directory of all Kafka broker nodes. In this example, the installation directory /opt/apps/ranger-plugin is used.

  5. Prepare the script file named install.properties and place the file in the /opt/apps/ranger-plugin/ranger-2.1.0-kafka-plugin directory.

    Configure the install.properties file of the Ranger plug-in based on your business requirements in different scenarios.

    • Configuration parameters

      Parameter

      Description

      Required

      COMPONENT_INSTALL_DIR_NAME

      The installation path of Kafka.

      EMR Kafka is installed in the /opt/apps/KAFKA/kafka-current directory. The installation directory is fixed.

      Yes

      POLICY_MGR_URL

      The URL of the Ranger policy library.

      Set this parameter based on the address of the Ranger service. For information about the value format, see the install.properties configuration file.

      Yes

      REPOSITORY_NAME

      The name of the policy library used.

      Set this parameter based on the address of the Ranger service. For information about the value format, see the install.properties configuration file.

      Yes

      XAAUDIT.SUMMARY.ENABLE

      Specifies whether to enable auditing. Default value: true.

      Yes

      XAAUDIT.SOLR.ENABLE

      Specifies whether to enable Solr auditing. Default value: true.

      Yes

      XAAUDIT.SOLR.URL

      The address of the Solr server.

      Set this parameter based on your business requirements. For more information about the value format, see the install.properties configuration file.

      Yes

      XAAUDIT.SOLR.USER

      The user used to access the Solr server.

      Yes

      XAAUDIT.SOLR.PASSWORD

      The password used to access the Solr server.

      Yes

      XAAUDIT.SOLR.ZOOKEEPER

      The address to access SolrCloud ZooKeeper.

      Yes

      XAAUDIT.SOLR.FILE_SPOOL_DIR

      The directory in which audit logs are stored.

      Set this parameter based on your business requirements. For more information about the value format, see the install.properties configuration file.

      Yes

    • Sample code

      In the following sample code, EMR V3.43.1 is used.

      # Location of component folder
      COMPONENT_INSTALL_DIR_NAME=/opt/apps/KAFKA/kafka-current
      
      #
      # Location of Policy Manager URL
      #
      # Example:
      # POLICY_MGR_URL=http://policymanager.xasecure.net:6080
      # You can replace the parameter value based on your business scenario.
      POLICY_MGR_URL=http://master-1-1.c-590b6062db9d****.cn-hangzhou.emr.aliyuncs.com:6080
      
      #
      # This is the repository name created within policy manager
      #
      # Example:
      # REPOSITORY_NAME=kafkadev
      # You can replace the parameter value based on your business scenario.
      REPOSITORY_NAME=kafkadev
      
      # AUDIT configuration with V3 properties
      
      #Should audit be summarized at source
      XAAUDIT.SUMMARY.ENABLE=true
      
      # Enable audit logs to Solr
      #Example
      #XAAUDIT.SOLR.ENABLE=true
      #XAAUDIT.SOLR.URL=http://localhost:6083/solr/ranger_audits
      #XAAUDIT.SOLR.ZOOKEEPER=
      #XAAUDIT.SOLR.FILE_SPOOL_DIR=/var/log/kafka/audit/solr/spool
      # You can replace the parameter value based on your business scenario.
      XAAUDIT.SOLR.ENABLE=true
      XAAUDIT.SOLR.URL=http://master-1-1.c-590b6062db9d****.cn-hangzhou.emr.aliyuncs.com:6083/solr/ranger_audits
      XAAUDIT.SOLR.USER=NONE
      XAAUDIT.SOLR.PASSWORD=NONE
      XAAUDIT.SOLR.ZOOKEEPER=NONE
      XAAUDIT.SOLR.FILE_SPOOL_DIR=/var/log/taihao-apps/kafka/audit/spool
      
      # Enable audit logs to ElasticSearch
      #Example
      #XAAUDIT.ELASTICSEARCH.ENABLE=true
      #XAAUDIT.ELASTICSEARCH.URL=localhost
      #XAAUDIT.ELASTICSEARCH.INDEX=audit
      
      XAAUDIT.ELASTICSEARCH.ENABLE=false
      XAAUDIT.ELASTICSEARCH.URL=NONE
      XAAUDIT.ELASTICSEARCH.USER=NONE
      XAAUDIT.ELASTICSEARCH.PASSWORD=NONE
      XAAUDIT.ELASTICSEARCH.INDEX=NONE
      XAAUDIT.ELASTICSEARCH.PORT=NONE
      XAAUDIT.ELASTICSEARCH.PROTOCOL=NONE
      
      # Enable audit logs to HDFS
      #Example
      #XAAUDIT.HDFS.ENABLE=true
      #XAAUDIT.HDFS.HDFS_DIR=hdfs://node-1.example.com:8020/ranger/audit
      #  If using Azure Blob Storage
      #XAAUDIT.HDFS.HDFS_DIR=wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
      #XAAUDIT.HDFS.HDFS_DIR=wasb://ranger_audit_cont****@my-azure-account.blob.core.windows.net/ranger/audit
      #XAAUDIT.HDFS.FILE_SPOOL_DIR=/var/log/kafka/audit/hdfs/spool
      
      XAAUDIT.HDFS.ENABLE=false
      XAAUDIT.HDFS.HDFS_DIR=hdfs://__REPLACE__NAME_NODE_HOST:8020/ranger/audit
      XAAUDIT.HDFS.FILE_SPOOL_DIR=/var/log/kafka/audit/hdfs/spool
      
      # Following additional propertis are needed When auditing to Azure Blob Storage via HDFS
      # Get these values from your /etc/hadoop/conf/core-site.xml
      #XAAUDIT.HDFS.HDFS_DIR=wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
      XAAUDIT.HDFS.AZURE_ACCOUNTNAME=__REPLACE_AZURE_ACCOUNT_NAME
      XAAUDIT.HDFS.AZURE_ACCOUNTKEY=__REPLACE_AZURE_ACCOUNT_KEY
      XAAUDIT.HDFS.AZURE_SHELL_KEY_PROVIDER=__REPLACE_AZURE_SHELL_KEY_PROVIDER
      XAAUDIT.HDFS.AZURE_ACCOUNTKEY_PROVIDER=__REPLACE_AZURE_ACCOUNT_KEY_PROVIDER
      
      #Log4j Audit Provider
      XAAUDIT.LOG4J.ENABLE=false
      XAAUDIT.LOG4J.IS_ASYNC=false
      XAAUDIT.LOG4J.ASYNC.MAX.QUEUE.SIZE=10240
      XAAUDIT.LOG4J.ASYNC.MAX.FLUSH.INTERVAL.MS=30000
      XAAUDIT.LOG4J.DESTINATION.LOG4J=true
      XAAUDIT.LOG4J.DESTINATION.LOG4J.LOGGER=xaaudit
      
      # End of V3 properties
      
      #
      #  Audit to HDFS Configuration
      #
      # If XAAUDIT.HDFS.IS_ENABLED is set to true, please replace tokens
      # that start with __REPLACE__ with appropriate values
      #  XAAUDIT.HDFS.IS_ENABLED=true
      #  XAAUDIT.HDFS.DESTINATION_DIRECTORY=hdfs://__REPLACE__NAME_NODE_HOST:8020/ranger/audit/%app-type%/%time:yyyyMMdd%
      #  XAAUDIT.HDFS.LOCAL_BUFFER_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit
      #  XAAUDIT.HDFS.LOCAL_ARCHIVE_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit/archive
      #
      # Example:
      #  XAAUDIT.HDFS.IS_ENABLED=true
      #  XAAUDIT.HDFS.DESTINATION_DIRECTORY=hdfs://namenode.example.com:8020/ranger/audit/%app-type%/%time:yyyyMMdd%
      #  XAAUDIT.HDFS.LOCAL_BUFFER_DIRECTORY=/var/log/kafka/audit
      #  XAAUDIT.HDFS.LOCAL_ARCHIVE_DIRECTORY=/var/log/kafka/audit/archive
      #
      XAAUDIT.HDFS.IS_ENABLED=false
      XAAUDIT.HDFS.DESTINATION_DIRECTORY=hdfs://__REPLACE__NAME_NODE_HOST:8020/ranger/audit/%app-type%/%time:yyyyMMdd%
      XAAUDIT.HDFS.LOCAL_BUFFER_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit
      XAAUDIT.HDFS.LOCAL_ARCHIVE_DIRECTORY=__REPLACE__LOG_DIR/kafka/audit/archive
      
      XAAUDIT.HDFS.DESTINTATION_FILE=%hostname%-audit.log
      XAAUDIT.HDFS.DESTINTATION_FLUSH_INTERVAL_SECONDS=900
      XAAUDIT.HDFS.DESTINTATION_ROLLOVER_INTERVAL_SECONDS=86400
      XAAUDIT.HDFS.DESTINTATION_OPEN_RETRY_INTERVAL_SECONDS=60
      XAAUDIT.HDFS.LOCAL_BUFFER_FILE=%time:yyyyMMdd-HHmm.ss%.log
      XAAUDIT.HDFS.LOCAL_BUFFER_FLUSH_INTERVAL_SECONDS=60
      XAAUDIT.HDFS.LOCAL_BUFFER_ROLLOVER_INTERVAL_SECONDS=600
      XAAUDIT.HDFS.LOCAL_ARCHIVE_MAX_FILE_COUNT=10
      
      #Solr Audit Provider
      XAAUDIT.SOLR.IS_ENABLED=false
      XAAUDIT.SOLR.MAX_QUEUE_SIZE=1
      XAAUDIT.SOLR.MAX_FLUSH_INTERVAL_MS=1000
      XAAUDIT.SOLR.SOLR_URL=http://localhost:6083/solr/ranger_audits
      
      # End of V2 properties
      
      #
      # SSL Client Certificate Information
      #
      # Example:
      # SSL_KEYSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-keystore.jks
      # SSL_KEYSTORE_PASSWORD=none
      # SSL_TRUSTSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-truststore.jks
      # SSL_TRUSTSTORE_PASSWORD=none
      #
      # You do not need use SSL between agent and security admin tool, please leave these sample value as it is.
      #
      SSL_KEYSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-keystore.jks
      SSL_KEYSTORE_PASSWORD=myKeyFilePassword
      SSL_TRUSTSTORE_FILE_PATH=/etc/hadoop/conf/ranger-plugin-truststore.jks
      SSL_TRUSTSTORE_PASSWORD=changeit
      
      
      #
      # Custom component user
      # CUSTOM_COMPONENT_USER=<custom-user>
      # keep blank if component user is default
      CUSTOM_USER=kafka
      
      
      #
      # Custom component group
      # CUSTOM_COMPONENT_GROUP=<custom-group>
      # keep blank if component group is default
      CUSTOM_GROUP=hadoop
  6. Run the following command to install the Ranger Kafka plug-in:

    sudo su - root
    cd /opt/apps/ranger-plugin/ranger-2.1.0-kafka-plugin
    ./enable-kafka-plugin.sh ./install.properties
  7. Modify the server.properties configuration file of Kafka in the EMR console.

    1. Go to the Configure tab of the Kafka service page in the EMR console.

    2. Click the server.properties tab.

    3. Modify the values of the following parameters:

      • kafka_server_start_cmd_addition_args: Append CLASSPATH=$CLASSPATH:/opt/apps/KAFKA/kafka-current/config to the parameter value.

        Note

        If you cannot configure the kafka_server_start_cmd_addition_args parameter on the Kafka service page in the EMR console, you cannot modify the CLASSPATH variable value by using this method. In this case, you can run the following command on the Kafka broker instance to add the configuration file of the Kafka plug-in:

        cd /opt/apps/KAFKA/kafka-current/libs
        sudo ln -s /opt/apps/KAFKA/kafka-current/config kafka-conf
      • authorizer.class.name: Change the parameter value to org.apache.ranger.authorization.kafka.authorizer.RangerKafkaAuthorizer.

    4. Save the configurations.

      1. Click Save.

      2. In the dialog box that appears, configure the Execution Reason parameter and click Save.

  8. Restart the Kafka broker in the EMR console.

    1. Go to the Status tab of the Kafka service page in the EMR console, find the KafkaBroker component, and then click Restart in the Actions column.

    2. In the dialog box that appears, configure the Execution Reason parameter and click OK.

    3. In the Confirm message, click OK.

References

  • In addition to manually installing the Ranger Kafka plug-in, you can also add bootstrap actions to install and deploy multiple Ranger Kafka plug-ins based on your business requirements. For more information, see Manually run scripts.

  • If your cluster is of EMR V3.45.0 or a later minor version, you can enable Kafka in Ranger in the EMR console. For more information, see Enable Kafka in Ranger and configure related permissions.