This topic provides answers to some frequently asked questions about Kudu.

Where do I view the log files of Kudu?

View the log files of Kudu in the /mnt/disk1/log/kudu directory.

What are the partitioning methods supported by Kudu?

Kudu supports range partitioning and hash partitioning. You can use the two partitioning methods together. For more information, see Apache Kudu Schema Design.

How do I access the web UI of Kudu?

Kudu is not integrated with Knox. You cannot use Knox to access the web UI of Kudu. You can create an SSH tunnel to access the web UI of Kudu. For more information, see Create an SSH tunnel to access web UIs of open source components.

What do I do if the error message "NonRecoverableException" appears on the Kudu client?

  • Problem description
    The following error information appears:
    org.apache.kudu.client.NonRecoverableException: Could not connect to a leader master. Client configured with 1 master(s) (192.168.0.10:7051) but cluster indicates it expects 3 master(s) (192.168.0.36:7051,192.168.0.11:7051,192.168.0.10:7051)
  • Cause

    This issue occurs because three master nodes are required, but only one master node is deployed.

  • Solution

    Deploy all the required master nodes, and connect the Kudu client to the primary master node.

How do I view the FAQ in the Kudu community?

For more information, see Apache Kudu Troubleshooting.

What do I do if the error message "Bad status: Network error: Could not obtain a remote proxy to the peer.: unable to resolve address for <hostname>: Name or service not known" appears?

  • Cause: The hostname cannot be resolved to an IP address. As a result, the raft server of the Kudu tablet cannot recognize the type of its peer raft servers. Whether the raft server can provide services as expected is unknown. Therefore, the network is terminated.
  • Solution:
    1. Manually add the mapping between the hostname and its resolved IP address to the /etc/hosts file.
    2. If the host represented by the hostname has been released, you can add the mapping between the hostname and a random IP address to the /etc/hosts file, regardless of whether the IP address is accessible. Then, the Kudu tablet server replicates data from the unavailable raft server to a new raft server that is added to the raft server group. This way, the raft server group can provide services as expected again.

What do I do if the error message "Bad status: I/O error: Failed to load Fs layout: could not verify integrity of files: <directory>, <number> data directories provided, but expected <number>" appears?

This issue is caused by the inconsistency between the number of disks specified by the -fs_data_dirs parameter and the metadata recorded by the -fs_metadata_dir parameter. To solve the issue, change the number of disks specified by the -fs_data_dirs parameter and keep the number of disks consistent with the metadata recorded by the -fs_metadata_dir parameter.

What do I do if the error message "pthread_create failed: Resource temporarily unavailable (error 11)" appears?

This error occurs because resources are unavailable and a thread fails to be created. Solve the issue based on the following cases:

  • Resources are unavailable.

    Check the value of the max user processes parameter by running the ulimit -a command. If the value is small, modify the /etc/security/limits.conf file. You can also create a /etc/security/limits.d/kudu.conf file and change the value of the max user processes parameter in the file.

  • Kudu client V0.8 is used in the hybrid deployment environment.

    Based on the description of the KUDU-1453 issue, Spark executors may leak threads if Kudu client V0.8 is used. To solve the issue, upgrade the version of Kudu client to V0.9.

  • Threads are leaked.
    • Issues caused by Trino

      When you exit Trino and wait for the take method of BlockingQueue in the shutdown hook thread to return an element, the shutdown hook thread cannot be interrupted, and the E-MapReduce (EMR) control continuously sends the SIGTERM signal to create new SIGTERM Handler threads. As a result, threads are exhausted.

      You can solve the issue on the Trino side, or directly execute the Kill -9 command.

    • Issues caused by Jindo SDK

      Spark uses the JindoOssCommitter class when Spark executes write jobs. This class creates a JindoOssMagicCommitter object and generates a thread pool named oss-committer-pool in the JindoOssMagicCommitter object. The thread pool is not static and is not manually shut down. JindoOssMagicCommitter objects are continuously created and new thread pools are continuously generated. The previously created thread pools are not released due to various reasons. Therefore, excessive threads are used. If you use Spark Streaming or Structure Streaming, system resources may be exhausted.

      You can specify the following parameters to solve the issue:
      spark.sql.hive.outputCommitterClass=org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
      spark.sql.sources.outputCommitterClass=org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
    • Tool troubleshooting
      You can execute the following threads_monitor.sh script to find out the process that uses the most threads in the system. Then, solve the issue.
      #!/bin/bash
      
      total_threads=0
      max_pid=-1
      max_threads=-1
      
      for tid in `ls /proc`
      do
        if [[ $tid != *self && -f /proc/$tid/status ]]; then
          num_threads=`cat /proc/$tid/status | grep Threads | awk '{print $NF}'`
          ((total_threads+=num_threads))
          if [[ ${max_pid} -eq -1 || ${max_threads} -lt ${num_threads} ]]; then
            max_pid=${tid}
            max_threads=${num_threads}
          fi
      #    echo "Thread ${pid}: ${num_threads}"
        fi
      done
      
      echo "Total threads: ${total_threads}"
      echo "Max threads: ${max_threads}, pid is ${max_pid}"
      ps -ef | grep ${max_pid} | grep -v grep

What do I do if I fail to start Kudu?

When you use Kudu, the Bigboot monitor provided by Bigboot performs operations such as start, run, and automatic restart on failures. A defect exists in Bigboot V3.5.0. If Kudu crashes, the service information in the database cannot be deleted. As a result, Kudu cannot be restarted. In this case, you need to stop Kudu, and then restart Kudu.
Note You need to perform these operations on your machine. The console may not perform the stop operation because the service is terminated.
To resolve the issue, run the following commands on a core or task node. If you run the following commands on a master node, replace kudu-tserver in the commands with kudu-master.
/usr/lib/b2monitor-current/bin/monictrl -stop kudu-tserver
/usr/lib/b2monitor-current/bin/monictrl -start kudu-tserver

What do I do if the error message "Service unavailable: RunTabletServer() failed: Cannot initialize clock: timed out waiting for clock synchronisation: Error reading clock. Clock considered unsynchronized" appears?

  • Problem description
    The following error message may be recorded in logs:
    E1010 10:37:54.165313 29920 system_ntp.cc:104] /sbin/ntptime
    ------------------------------------------
    stdout:
    ntp_gettime() returns code 5 (ERROR)
      time e6ee0402.2a452c4c  Mon, Oct 10 2022 10:37:54.165, (.165118697),
      maximum error 16000000 us, estimated error 16000000 us, TAI offset 0
    ntp_adjtime() returns code 5 (ERROR)
      modes 0x0 (),
      offset 0.000 us, frequency 187.830 ppm, interval 1 s,
      maximum error 16000000 us, estimated error 16000000 us,
      status 0x2041 (PLL,UNSYNC,NANO),
      time constant 6, precision 0.001 us, tolerance 500 ppm,
  • Cause: ntpd on the machine cannot connect to the configured ntp server.
  • Solution: Restart the server and try again.

What do I do if the error message "Rejecting Write request: Soft memory limit exceeded" appears?

  • Cause: The amount of data to be written exceeds the soft memory limit.
  • Solution:
    You can perform the following operations:
    1. Configure the memory_limit_hard_bytes parameter to increase the memory size. The default value is 0, which indicates that the maximum memory usage is automatically set by the system. You can change the value to -1. This indicates that no limit is imposed on the memory usage.
    2. Configure the memory_limit_soft_percentage parameter to adjust the percentage of available memory. The default value is 80.