The SmartData service of a version that ranges from 3.0.X to 3.5.X has a known defect, which may cause damage to cached data. As a result, an error may be reported when you read cached data from SmartData. This topic describes the impact of the issue, the solution, and the procedure to fix the issue.

Impact of the issue

  • Affected components: all components for which the data caching feature of SmartData is enabled
    Notice If the SmartData service is deployed in your cluster, but you do not use the data caching feature, you can ignore the notice.

    SmartData allows you to use JindoFS in block storage mode or cache mode.

  • Affected versions:
    • E-MapReduce (EMR): V3.30.X, V4.5.X, V3.32.X, V4.6.X, V3.33.X, V4.7.X, V3.34.X, V4.8.X, V3.35.X, and V4.9.X
    • SmartData: 3.0.X, 3.1.X, 3.2.X, 3.3.X, 3.4.X, and 3.5.X
  • Severity level: critical. The occasional occurrence of the issue affects the accuracy of data. We recommend that you fix the issue.
  • Issue description: If JindoFS in block storage mode or cache mode is used, data may be contaminated at a low probability. As a result, an error may be reported when you read the data. To use JindoFS in cache mode, set the jfs.cache.data-cache.enable parameter to true. In block storage mode, the data caching feature is enabled by default. For example, a data parsing error is reported when you read data from ORC or Parquet files, or an HFile format error is reported when you read HBase data.

Solution:

Cached data in SmartData of a version that ranges from 3.0.X to 3.5.X is damaged due to a defect in the merge process of small files. To avoid the issue, modify configurations to disable the merging of small files, and then restart the SmartData service. If the issue has occurred, disable the data caching feature first to eliminate the impact of cached data and recover online business at the earliest opportunity. If you use only JindoFS in cache mode, you can use a tool to format the cache system to clear all cached data in your cluster. This way, all cached data blocks that may be damaged are cleared. After the clearing operation is complete, re-enable data caching.

Fixing procedure

Common fixing procedure

If the issue does not occur in your cluster, perform the following steps to avoid the issue:

  1. On the SmartData service page in the EMR console, add a custom configuration item.
    1. On the SmartData service page, click the Configure tab, click the storage tab in the Service Configuration section, and then click Custom Configuration in the upper-right corner.
      storage
    2. In the Add Configuration Item dialog box, add a configuration item whose name is storage.compaction.enable and value is false.
      Add a configuration item
    3. Click OK.
  2. Restart Jindo Storage Service.
    1. In the upper-right corner of the SmartData service page, choose Actions > Restart Jindo Storage Service.
      Restart a service
    2. In the Cluster Activities dialog box, configure the Description parameter and click OK.
    3. In the Confirm message, click OK.

Fixing procedure in emergency

If the issue has occurred, perform the following steps to recover business and fix the caching mechanism:

  1. Modify configurations on the client tab in the Service Configuration section of the Configure tab for the SmartData service to disable data caching. This eliminates the impact of cached data.
    • If you use JindoFS in block storage mode, add a configuration item whose name is jfs.data-cache.enable and value is false.
    • If you use JindoFS in cache mode, change the value of the jfs.cache.data-cache.enable parameter to false.
  2. Rerun related jobs.

    After you rerun related jobs, the jobs can work as expected. If the issue persists after you rerun the jobs, take other measures to identify the cause or submit a ticket.

    For a component that is deployed on Presto, Impala, or HBase, you must restart the component to make the preceding configuration take effect. For a component that is deployed on Hive on YARN or Spark on YARN, you can directly rerun related jobs to make the preceding configuration take effect.

  3. Fix the caching mechanism.
    • If you use JindoFS in block storage mode, submit a ticket to update components.
    • If you use JindoFS in cache mode, perform the following steps:
      Note The following operations do not affect business because data caching is disabled.
      1. Stop the SmartData service in the EMR console.
      2. Upload the format_cache.sh script to the master node of your cluster and run the following command as the hadoop user:
        sh format_cache.sh
      3. On the storage tab in the Service Configuration section of the Configure tab for the SmartData service, add a configuration item whose name is storage.compaction.enable and value is false.
      4. Restart the SmartData service in the EMR console.
      5. Change the value of the jfs.cache.data-cache.enable parameter to true to re-enable data caching.