Peer-to-peer (P2P) caching of the JindoFSx client can be regarded as a form of local caching. Compared with the original local caching feature, P2P caching allows for data reads from other clients. When a client requests a data block from local caches, the client pulls the required data from other clients that contain the required data. If the client cannot send requests to pull data from other clients, the required data is read from remote servers or by using the Security Token Service (STS) token. This topic describes how to use the distributed P2P caching feature to download data.
Prerequisites
A cluster of EMR V3.42.0 or a later minor version, or a cluster of EMR V5.6.0 or a later minor version is created in the EMR console, and the JindoData service is selected from the optional services when you create the cluster. For more information, see Create a cluster.Procedure
Step 1: Configure the server
- Go to the common tab of the JindoData service.
- Log on to the EMR on ECS console.
- In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
- On the EMR on ECS page, find the cluster that you want to manage and click Services in the Actions column.
- Click Configure in the JindoData section.
- Click the common tab.
- Add configuration items.
- Restart the JindoData service.
- On the Services tab of the JindoData service, choose .
- In the Restart JINDODATA Services dialog box, specify the execution reason and click OK.
- In the Confirm message, click OK.
Step 2: Configure JindoSDK
- Go to the Configure tab.
- Log on to the EMR on ECS console.
- In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
- On the EMR on ECS page, find the cluster that you want to manage and click Services in the Actions column.
- On the Services tab, click Configure in the HADOOP-COMMON section.
- Click the core-site.xml tab.
- On the core-site.xml tab, modify the following configuration items.
For more information about how to add configuration items, see the "Add configuration items" section in the Add configuration items topic. For more information about how to modify configuration items, see the "Modify configuration items" section in the Modify configuration items topic.
Item Parameter Description Configure the implementation class of Object Storage Service (OSS) fs.AbstractFileSystem.oss.impl Set the value to com.aliyun.jindodata.oss.OSS.fs.oss.impl Set the value to com.aliyun.jindodata.oss.JindoOssFileSystem.Specify the engine type fs.xengine Set the value to jindofsx.Configure the endpoint of the JindoFSx Namespace service fs.jindofsx.namespace.rpc.address Specify the value in the ${headerhost}:8101 format. Example: master-1-1:8101. Note For more information about how to configure and use the Namespace service in high availability mode, see Configure and use the JindoFSx Namespace service in high availability mode.Configure data caching for query acceleration Note After you enable this feature, hot data blocks are cached on local disks. By default, this feature is disabled, and you can read data from OSS/OSS-HDFS.fs.jindofsx.data.cache.enable Specifies whether to enable data caching for query acceleration. Valid values: - false: disables the feature. This is the default value.
- true: enables the feature.
- Save the modifications.
- On the Configure tab, click Save.
- In the Save dialog box, configure the Execution Reason parameter, turn on Automatically Update Configurations, and then click Save.
Step 3: Use the distributed P2P caching feature
After you complete the preceding configurations, if the paths of files that you want to read match one of the prefixes that are specified by the jindofsx.p2p.file.prefix parameter, all read requests are processed by using the distributed P2P caching feature without the need to call other API operations. For example, you can run Hadoop shell commands to download files to your on-premises machine. If the paths of files match one of the specified prefixes, the distributed P2P caching feature is automatically enabled.
P2P record for path:If the preceding information exists, the read request of the file is processed by using the distributed P2P caching feature.