This topic describes the external client of JindoFileSystem (JindoFS) and its scenarios.

Overview

JindoFS provides an external client so that you can access JindoFS from the outside of an E-MapReduce cluster. If you want to access JindoFS from the external client, make sure that JindoFS is in block storage mode. Currently, you cannot access JindoFS from the external client if JindoFS is in cache mode. To access JindoFS in cache mode from the outside of an E-MapReduce cluster, use the common Object Storage Service (OSS) client because the cache mode is compatible with original OSS semantics.

Scenarios

The external client of JindoFS is compatible with Hadoop Distributed File System (HDFS). To access data stored in JindoFS from the external client, make sure that your application is connected to Namespace Service of JindoFS. However, you cannot access cached data in the local cluster from the external client. In this case, the performance of data access from the external client is not as efficient as that from the inside of an E-MapReduce cluster.

Configure the external client

Make sure that the namespace supported by JindoFS in block storage mode is configured. For more information, see Use the block storage mode.

  1. Obtain the Bigboot package.

    Access the /usr/lib/bigboot-current directory in the E-MapReduce cluster to obtain the Bigboot package.

    Note The Bigboot package is developed based on native code, which may be incompatible with your operating system. In this case, submit a ticket if relevant code needs to be compiled again.
  2. Set up the environment.

    Set the BIGBOOT_HOME variable to the root directory for installing Bigboot on your device. Add the ext and lib directories in the root directory to the classpath parameter of your component for processing big data, such as Hadoop or Spark.

  3. Copy the configuration file bigboot.cfg.external from the /usr/lib/bigboot-current/conf/ directory in the E-MapReduce cluster to the installation directory conf/ on your device.
  4. Configure Namespace Service.

    • client.namespace.rpc.port: the port for listening on Namespace Service.
    • client.namespace.rpc.address: the endpoint for listening on Namespace Service.
      Note By default, E-MapReduce sets the preceding two parameters in the Bigboot configuration file.
  5. Set data access parameters.

    • client.namespaces.{ YourNamespace}.oss.access.bucket: the OSS bucket to be accessed.
    • client.namespaces.{ YourNamespace}.oss.access.endpoint: the endpoint for accessing the OSS bucket.
    • client.namespaces.{ YourNamespace}.oss.access.key: the AccessKey ID used to access the OSS bucket.
    • client.namespaces.{ YourNamespace}.oss.access.secret: the AccessKey secret used to access the OSS bucket.
      Note In the preceding parameters, {YourNamespace} specifies the namespace that you want to access from the external client. In this topic, a namespace named test is used.

      Configuration example:

      client.namespace.rpc.port = 8101
      client.namespace.rpc.address = {RPC_Address}
      client.namespaces.test.oss.access.bucket = {YourOssBucket}
      client.namespaces.test.oss.access.endpoint = {YourOssEndpoint}
      client.namespaces.test.oss.access.key = {YourOssKey}
      client.namespaces.test.oss.access.secret = {YourOssSecret}

Verify the configuration

  • Run the following command to check whether the test namespace is configured correctly:
    hdfs dfs -ls jfs://test/
  • Run the following commands to check whether data can be uploaded to or downloaded from the test namespace:
    hdfs dfs -put /etc/hosts  jfs://test/
    
    hdfs dfs -get jfs://test/hosts