This topic describes the external client of JindoFS and its use scenarios.

Overview

JindoFS provides an external client so that you can access JindoFS from the outside of an E-MapReduce (EMR) cluster. If you want to access JindoFS from the external client, make sure that JindoFS is in block storage mode. You cannot access JindoFS from the external client if JindoFS is in cache mode. To access JindoFS in cache mode from the outside of an EMR cluster, use the common Object Storage Service (OSS) client because the cache mode is compatible with original OSS semantics.

Scenarios

The external client of JindoFS is compatible with Hadoop Distributed File System (HDFS). To access data stored in JindoFS from the external client, make sure that your application is connected to Namespace Service of JindoFS. However, you cannot access cached data in the local cluster from the external client. In this case, the performance of data access from the external client is not as efficient as that from the inside of an EMR cluster.

Configure the external client

Make sure that a namespace supported by JindoFS in block storage mode is configured. For more information, see Use the block storage mode.

  1. Obtain the Bigboot package.

    Access the /usr/lib/bigboot-current directory in the EMR cluster to obtain the Bigboot package.

    Note The Bigboot package is developed based on native code, which may be incompatible with your operating system.
  2. Set up the environment.

    Set the BIGBOOT_HOME variable to the root directory for installing Bigboot on your device. Add the ext and lib directories in the root directory to the classpath parameter of your big data processing component, such as Hadoop or Spark.

  3. Copy the configuration file bigboot.cfg.external from the /usr/lib/bigboot-current/conf/ directory in the EMR cluster to the installation directory conf/ on your device.
  4. Configure Namespace Service.

    • client.namespace.rpc.port: the port for listening on Namespace Service.
    • client.namespace.rpc.address: the endpoint for listening on Namespace Service.
      Note By default, the preceding two parameters are configured in the Bigboot configuration file of the EMR cluster.
  5. Configure data access parameters.

    • client.namespaces.{ YourNamespace}.oss.access.bucket: the OSS bucket to be accessed.
    • client.namespaces.{ YourNamespace}.oss.access.endpoint: the endpoint of the OSS bucket.
    • client.namespaces.{ YourNamespace}.oss.access.key: the AccessKey ID used to access the OSS bucket.
    • client.namespaces.{ YourNamespace}.oss.access.secret: the AccessKey secret used to access the OSS bucket.
      Note In the preceding parameters, {YourNamespace} specifies the namespace that you want to access from the external client. In this topic, a namespace named test is used.

      Example:

      client.namespace.rpc.port = 8101
      client.namespace.rpc.address = {RPC_Address}
      client.namespaces.test.oss.access.bucket = {YourOssBucket}
      client.namespaces.test.oss.access.endpoint = {YourOssEndpoint}
      client.namespaces.test.oss.access.key = {YourOssAccessKeyID}
      client.namespaces.test.oss.access.secret = {YourOssAccessKeySecret}

Verify the configuration

Perform the following operations:

  • Run the following command to check whether the test namespace is correctly configured:
    hdfs dfs -ls jfs://test/
  • Run the following commands to check whether data can be uploaded to or downloaded from the test namespace:
    hdfs dfs -put /etc/hosts  jfs://test/
    
    hdfs dfs -get jfs://test/hosts