All Products
Search
Document Center

E-MapReduce:Enable native query acceleration

Last Updated:Mar 26, 2026

JindoTable's native engine accelerates queries on ORC and Parquet files stored in JindoFS or OSS — no changes to your existing SQL or DataFrame code required. This topic describes how to enable query acceleration for Spark, Hive, and Presto in E-MapReduce (EMR).

Prerequisites

Before you begin, make sure you have:

  • An EMR cluster is created

  • ORC or Parquet files stored in JindoFS or Object Storage Service (OSS)

For information on creating a cluster, see Create a cluster.

Limitations

The following limitations apply to all engines:

  • Binary files are not supported.

  • Partitioned tables whose values of partition key columns are stored in files are not supported.

  • EMR clusters of V5.X.X or a later version are not supported.

  • spark.read.schema (userDefinedSchema) is not supported.

  • DATE values must use the YYYY-MM-DD format and fall within the range 1400-01-01 to 9999-12-31.

  • If a table has two columns with the same name but different letter cases (for example, NAME and name), queries on those columns cannot be accelerated.

Engine and file format support:

Engine ORC Parquet
Spark2 Supported Supported
Spark3 Supported Supported
Presto Supported Supported
Hive2 Not supported Supported
Hive3 Not supported Supported

Engine and file system support:

Engine OSS JindoFS HDFS (Hadoop Distributed File System)
Spark2 Supported Supported Supported
Presto Supported Supported Supported
Hive2 Supported Supported Not supported
Hive3 Supported Supported Not supported

Enable query acceleration for Spark

Note

Query acceleration uses off-heap memory. Add --conf spark.executor.memoryOverhead=4g to your Spark job to allocate additional memory for acceleration.

To read data with the native engine, use the DataFrame API or Spark SQL — no other APIs are supported.

You can enable query acceleration globally for all Spark jobs or at the individual job level.

Option 1: Global configuration (console)

  1. Log on to the Alibaba Cloud EMR console.

  2. In the top navigation bar, select the region of your cluster.

  3. Click the Cluster Management tab.

  4. Find your cluster and click Details in the Actions column.

  5. In the left-side navigation pane, choose Cluster Service > Spark.

  6. Click the Configure tab.

  7. Search for the spark.sql.extensions parameter and set its value to:

    io.delta.sql.DeltaSparkSessionExtension,com.aliyun.emr.sql.JindoTableExtension
  8. Click Save in the upper-right corner. In the Confirm Changes dialog box, enter a description and click OK.

  9. Choose Actions > Restart ThriftServer in the upper-right corner. In the Cluster Activities dialog box, enter a description and click OK. Then click OK in the confirmation message.

Option 2: Job-level configuration

Add the following flag when submitting a Spark Shell or Spark SQL job:

--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension,com.aliyun.emr.sql.JindoTableExtension

For details, see Configure a Spark Shell job or Configure a Spark SQL job.

Verify that acceleration is enabled

  1. Open the Spark History Server web UI.

  2. Click the SQL tab and find your query.

If JindoDataSourceV2Scan appears in the query plan, acceleration is active. If it does not appear, verify the spark.sql.extensions configuration in the steps above.

check_Jindo

Enable query acceleration for Presto

Important

Presto uses off-heap memory for query acceleration and supports high query concurrency. Make sure the cluster has more than 10 GB of available memory before enabling this feature.

Note

Complex data types — MAP, STRUCT, and ARRAY — are not supported when using query acceleration with Presto.

The Presto service includes a built-in catalog named hive-acc that uses the native engine. Connect to Presto using this catalog to enable acceleration:

presto --server emr-header-1:9090 --catalog hive-acc --schema default

No other configuration is required.

Enable query acceleration for Hive

Important

If your jobs have high stability requirements, do not enable query acceleration for Hive.

Note

Complex data types — MAP, STRUCT, and ARRAY — are not supported when using query acceleration with Hive.

You can enable query acceleration using the EMR console or the Hive CLI.

Option 1: EMR console

  1. On the Hive service page, click the Configure tab.

  2. Search for the hive.jindotable.native.enabled parameter and set its value to true.

  3. Save the configuration and restart the Hive service.

This method applies to both Hive on MapReduce and Hive on Tez jobs.

hive

Option 2: Hive CLI

Run the following command in the CLI:

set hive.jindotable.native.enabled=true;

The query acceleration plug-in for Parquet files is included by default in EMR V3.35.0 and later. No additional installation is required.