edit-icon download-icon

Spark + Kafka

Last Updated: Feb 02, 2018

This article introduces how to run Spark Streaming job to handle the data of Kafka clusters in Hadoop clusters in E-MapReduce.

Reference

Because the Hadoop clusters and Kafka clusters in E-MapReduce are both based on the open source software, you can refer to the official documents during developing.

Visit Kerberos Kafka cluster

E-MapReduce supports to create Kafka clusters that are based on the Kerberos authentication. To visit Kerberos Kafka cluster by using Hadoop cluster jobs, use the following two methods:

  • non-kerberos Hadoop cluster: Provide the Kerberos authentication file: kafka_client_jaas.conf for Kafka clusters.
  • kerberos Hadoop cluster: Based on the cross-domain trust of kerberos clusters, it provides the Kerberos authentication file: kafka_client_jaas.conf for Kafka clusters.

The two methods

The kafka_client_jaas.conf file used for Kerberos authentication must be provided when you run the jobs.

The formatting of kafka_client_jaas.conf file is as follows:

  1. KafkaClient {
  2. com.sun.security.auth.module.Krb5LoginModule required
  3. useKeyTab=true
  4. storeKey=true
  5. serviceName="kafka"
  6. keyTab="/path/to/kafka.keytab"
  7. principal="kafka/emr-header-1.cluster-12345@EMR.12345.COM";
  8. };

Visit Kerberos Kafka cluster by using Spark Streaming

When you run Spark Streaming job to visit Kerberos Kafka cluster, you can use spark-submit cmdlet with parameters about kafka_client_jaas.conf and kafka.keytab files.

  1. spark-submit --conf spark.driver.extraJavaOptions=-Djava.security.auth.login.config={{PWD}}/kafka_client_jaas.conf --conf spark.executor.extraJavaOptions=-Djava.security.auth.login.config={{PWD}}/kafka_client_jaas.conf --files /local/path/to/kafka_client_jaas.conf,/local/path/to/kafka.keytab --class xx.xx.xx.KafkaSample --num-executors 2 --executor-cores 2 --executor-memory 1g --master yarn-cluster xxx.jar arg1 arg2 arg3

Note that in the kafka_client_jaas.conf file, the path of keytab file is relative. Follow this formatting when you configure keyTab:

  1. KafkaClient {
  2. com.sun.security.auth.module.Krb5LoginModule required
  3. useKeyTab=true
  4. storeKey=true
  5. serviceName="kafka"
  6. keyTab="kafka.keytab"
  7. principal="kafka/emr-header-1.cluster-12345@EMR.12345.COM";
  8. };
Thank you! We've received your feedback.