Optimize computing costs for SQL and MapReduce jobs - MaxCompute

MaxCompute bills by the amount of data scanned and compute resources consumed. Two patterns account for most unexpected bill spikes: SQL jobs that scan more data than necessary, and MapReduce jobs that allocate more resources than the workload requires. This topic explains how to diagnose and fix both.

Estimate costs before running jobs

Use TCO tools to estimate computing costs before submitting jobs. For SQL jobs, use CostSQL to preview the cost of a query before it runs. Configure resource consumption alerts to catch unexpected cost growth early.

Reduce SQL computing costs

Avoid full table scans

Full table scans are the leading cause of high SQL computing costs. MaxCompute charges based on data scanned — scanning an entire table when you only need a subset multiplies your bill.

Disable full table scans at the session or project level:

-- Disable for the current session
set odps.sql.allow.fullscan=false;

-- Disable for the entire project
SetProject odps.sql.allow.fullscan=false;

Use column pruning — avoid SELECT *:

SELECT * always triggers a full table scan. Select only the columns you need:

-- Table T has columns a, b, c, d, e.
-- This query reads only a, b, and e — skipping c and d entirely.
SELECT a, b FROM T WHERE e < 10;

Use partition pruning — filter on partition key columns:

Specifying a partition key in the WHERE clause tells MaxCompute to skip irrelevant partitions and scan only matching data:

SELECT a, b FROM T WHERE partitiondate = '2017-10-01';

Without a partition filter, a JOIN or SELECT on a partitioned table falls back to a full table scan. Always prune partitions before performing JOINs. For cases where partition pruning does not take effect, see Scenarios where partition pruning does not take effect.

Rewrite costly SQL patterns

Some SQL keywords trigger extra shuffling and sorting, which consume additional compute resources. The root cause is data movement — when the engine must rearrange data across nodes to satisfy a query, it consumes CPU and I/O proportional to the data volume moved. Rewriting these patterns reduces or eliminates that movement.

Replace FULL OUTER JOIN with UNION ALL

FULL OUTER JOIN requires the engine to match every row in both tables, producing a large intermediate shuffle. UNION ALL eliminates the join entirely by padding missing values with zeros:

-- Original: FULL OUTER JOIN
SELECT COALESCE(t1.id, t2.id) AS id, SUM(t1.col1) AS col1, SUM(t2.col2) AS col2
FROM (
  SELECT id, col1 FROM table1
) t1
FULL OUTER JOIN (
  SELECT id, col2 FROM table2
) t2
ON t1.id = t2.id
GROUP BY COALESCE(t1.id, t2.id);

-- Optimized: UNION ALL (no join, no data movement overhead)
SELECT t.id, SUM(t.col1) AS col1, SUM(t.col2) AS col2
FROM (
  SELECT id, col1, 0 AS col2 FROM table1
  UNION ALL
  SELECT id, 0 AS col1, col2 FROM table2
) t
GROUP BY t.id;

Move GROUP BY outside UNION ALL

Placing GROUP BY inside each branch of a UNION ALL causes the engine to aggregate twice. Aggregate once after the union:

-- Original: GROUP BY inside each branch (double aggregation)
SELECT t.id, SUM(t.val) AS val
FROM (
  SELECT id, SUM(col3) AS val FROM table3 GROUP BY id
  UNION ALL
  SELECT id, SUM(col4) AS val FROM table4 GROUP BY id
) t
GROUP BY t.id;

-- Optimized: single aggregation after union
SELECT t.id, SUM(t.val) AS val
FROM (
  SELECT id, col3 AS val FROM table3
  UNION ALL
  SELECT id, col4 AS val FROM table4
) t
GROUP BY t.id;

Replace DISTINCT with GROUP BY

DISTINCT on large datasets requires a full sort. GROUP BY achieves the same result with less overhead:

-- Original: DISTINCT (full sort)
SELECT COUNT(DISTINCT id) AS cnt FROM table1;

-- Optimized: GROUP BY (more efficient deduplication)
SELECT COUNT(1) AS cnt
FROM (
  SELECT id FROM table1 GROUP BY id
) t;

Other patterns to avoid:

Use INSERT INTO with a partition field instead of inserting without partitioning. This reduces SQL complexity and lowers cost.
Sort temporarily exported data using external tools such as Excel instead of ORDER BY. ORDER BY in MaxCompute triggers a global sort across the entire dataset.

Control scheduling frequency

MaxCompute is designed for large-batch processing, not real-time queries. Scheduling SQL jobs at short intervals — every few seconds or minutes — accumulates job queues under pay-as-you-go billing, and the next day's bill can spike unexpectedly.

Run CostSQL to estimate the cost of frequently scheduled jobs before setting their cadence. For workloads that need near-real-time results, use a dedicated real-time computing service rather than MaxCompute.

Preview table data without running SQL

Running SELECT * FROM table LIMIT 10 consumes computing resources. Use the built-in table preview feature instead — it reads data directly from storage without triggering a compute job:

DataWorks: Open the Data Map page, find the table, and use the preview tab. See View the details of a table.
MaxCompute Studio: Double-click a table in the object tree to preview its data.

Match the tool to the workload

MaxCompute returns results in minutes, not milliseconds. It is the right tool for large-batch analytics but a poor fit for frontend queries that require instant responses.

For frontend queries — dashboards, search results, row lookups — use a relational database such as ApsaraDB for RDS. Store aggregated MaxCompute results there and serve queries from the database. Frontend queries that lack a WHERE clause, perform no aggregation, and join no dictionaries will always be slow in MaxCompute.

Reduce MapReduce computing costs

Configure split size and reducer count

Two configuration settings have the largest impact on MapReduce resource consumption.

Split size controls how many mappers are created. The default is 256 MB per split. A smaller split size creates more mappers, increasing parallelism but also resource usage. Use JobConf#setSplitSize to tune this value based on the computational cost of your map logic — if each record is expensive to process, reduce the split size to create more mappers and distribute the work.

Reducer count defaults to one-quarter of the number of mappers and can be set to any value from 0 to 2,000. More reducers consume more resources. Set only as many reducers as your aggregation workload requires, and use jobconf.setNumReduceTasks(num) to configure the count explicitly.

Merge serial jobs using pipeline mode

When multiple MapReduce jobs are chained — where the output of one job feeds the next — each intermediate job writes results to disk and reads them back. This disk I/O compounds across the chain.

Pipeline mode merges serial MapReduce jobs into a single job, eliminating intermediate disk reads and writes. This reduces both cost and scheduling overhead. For implementation examples, see Pipeline examples.

Prune columns in input tables

When a mapper reads an input table but only needs a few of its columns, reading the full row wastes I/O. Specify the required columns when adding an input table:

InputUtils.addTable(TableInfo.builder().tableName("wc_in").cols(new String[]{"c1","c2"}).build(), job);

After this configuration, the mapper reads only the c1 and c2 columns. Data accessed by column name is unaffected; data accessed by subscript index may behave differently.

Read resources in the setup stage

Each call to read a resource incurs overhead, and you can read resources up to 64 times. Reading the same resource inside a map or reduce function causes it to be re-read on every record. Read resources once in the setup stage instead. For usage examples, see Resource usage example.

Avoid constructing objects in the map or reduce function

Java objects constructed inside a map or reduce function are rebuilt on every record invocation. Move object construction to the setup stage:

Record word;
Record one;

public void setup(TaskContext context) throws IOException {
  // Construct once in setup — not on every map call
  word = context.createMapOutputKeyRecord();
  one = context.createMapOutputValueRecord();
  one.set(new Object[]{1L});
}

Use a combiner when map output has duplicate keys

A combiner pre-aggregates the output of a map task before it is sent to reducers, reducing the volume of data transferred across the network during the shuffle phase.

Use a combiner only when map output contains multiple records with the same key — for example, in a word count job. If map output keys are unique, a combiner adds overhead without benefit.

The following combiner sums values for matching keys:

/**
 * A combiner class that combines map output by summing values.
 */
public static class SumCombiner extends ReducerBase {

  private Record count;

  @Override
  public void setup(TaskContext context) throws IOException {
    count = context.createMapOutputValueRecord();
  }

  @Override
  public void reduce(Record key, Iterator<Record> values, TaskContext context)
      throws IOException {
    long c = 0;
    while (values.hasNext()) {
      Record val = values.next();
      c += (Long) val.get(0);
    }
    count.set(0, c);
    context.write(key, count);
  }
}

Prevent data skew with partition columns or a custom partitioner

By default, reducers receive data based on the hash of the full key schema. If some key values are far more common than others, certain reducers receive significantly more data — a long-tail issue where some reducers finish long after others, holding up the entire job.

To distribute data more evenly across reducers, specify partition key columns using JobConf#setPartitionColumns. Data is routed to reducers based on the hash of those columns rather than the full key:

jobconf.setPartitionerClass(MyPartitioner.class)
jobconf.setNumReduceTasks(num)

For finer control, implement a custom partitioner:

import com.aliyun.odps.mapred.Partitioner;

public static class MyPartitioner extends Partitioner {

  @Override
  public int getPartition(Record key, Record value, int numPartitions) {
    // numPartitions is the total number of reducers.
    // Route each key to a reducer based on key length.
    String k = key.get(0).toString();
    return k.length() % numPartitions;
  }
}

Keep JVM memory within the 1:4 CPU-to-memory ratio

The standard configuration is 1 CPU core and 4 GB of memory, with odps.stage.reducer.jvm.mem set to 4006. Allocating memory beyond a 1:4 ratio relative to CPU cores increases billing. Tune odps.stage.reducer.jvm.mem to match your actual workload rather than over-provisioning.

What's next

To optimize storage costs, see Optimize storage costs.
To reduce data upload and download costs, see Optimize the costs of data uploads and downloads.
To analyze and resolve billing anomalies, see Manage costs.
To generate a resource optimization plan, see Generate a computing resource optimization plan.