Fix Incompatible SQL Statements & Prevent Migration Failures in MaxCompute V2.0 - MaxCompute

MaxCompute V2.0 enforces stricter SQL syntax rules to align with open-source ecosystems and improve performance. Some statements that ran without errors in V1.0 now fail or produce warnings in V2.0.

This topic covers each incompatibility category, describes the error or behavior change, and shows how to rewrite the affected statements.

Before you begin

When MaxCompute V2.0 fails to run a job, the framework automatically falls back to V1.0. This rollback increases job latency. To prevent the rollback from masking issues during migration, disable it before submitting jobs:

set odps.sql.planner.mode=lot;

The MaxCompute team notifies job owners by email or DingTalk when rollbacks are detected. Fix the affected statements as soon as possible to avoid job failures after rollback support is removed.

Quick reference

Category	Description
group.by.with.star	`SELECT *` with GROUP BY requires all columns in the GROUP BY clause
bad.escape	Escape sequences must use exactly three octal digits
column.repeated.in.creation	Duplicate column names in CREATE TABLE
duplicated.partition.column	Duplicate partition key names in a query
string.join.double	Implicit STRING-to-DOUBLE conversion in JOIN conditions
window.ref.prev.window.alias	Window functions cannot reference aliases from sibling window functions
select.invalid.token.after.star	`SELECT *` cannot have an alias
agg.having.ref.prev.agg.alias	HAVING cannot use SELECT-clause aliases
order.by.no.limit	ORDER BY requires a LIMIT clause
generated.column.name.multi.window	Auto-generated column aliases may change between versions
non.boolean.filter	Non-BOOLEAN expressions in WHERE/HAVING filters
post.select.ambiguous	Ambiguous column references in ORDER BY / CLUSTER BY / DISTRIBUTE BY / SORT BY
order.by.col.ambiguous	ORDER BY references duplicate aliases in SELECT
in.subquery.without.result	Column in IN subquery does not exist in the source table
ctas.if.not.exists	Syntax errors in CTAS destination table are now validated
dynamic.pt.to.static	Dynamic partitions may be converted to static partitions by the optimizer
lot.not.in.subquery	NULL handling in NOT IN subqueries
having.use.select.alias	HAVING cannot use column aliases defined in SELECT
divide.nan.or.overflow	Division constant folding causes errors for `/0` at plan time
small.table.exceeds.mem.limit	MAPJOIN hints override multi-way join optimization
sigkill.oom	Large MAPJOIN small tables cause worker out-of-memory errors
wm_concat.first.argument.const	WM_CONCAT first argument must be a constant
pt.implicit.convertion.failed	Implicit type conversion fails on partition key comparisons
worker.restart.instance.timeout	UDF vectorized execution causes heartbeat timeouts

DDL statements

group.by.with.star

In V1.0, SELECT * FROM t GROUP BY key was valid even when not all columns were listed in the GROUP BY clause. In V2.0, every column in the source table must appear in the GROUP BY clause.

V1.0 (fails in V2.0):

select * from t group by key;

Error:

FAILED: ODPS-0130071:[1,8] Semantic analysis exception - column reference t.value should appear in GROUP BY key

V2.0:

select distinct key from t;

If the GROUP BY clause does include all columns, the statement does not error in V2.0, but the SELECT DISTINCT form is clearer:

-- Not recommended (no error, but intent is unclear)
select * from t group by key, value; -- t has columns key and value

-- Recommended
select distinct key, value from t;

column.repeated.in.creation

V2.0 returns an error when a CREATE TABLE statement defines the same column name more than once. V1.0 allowed this.

V1.0 (fails in V2.0):

create table t (a BIGINT, b BIGINT, a BIGINT);

Error:

FAILED: ODPS-0130071:[1,37] Semantic analysis exception - column repeated in creation: a

V2.0:

create table t (a BIGINT, b BIGINT);

duplicated.partition.column

In V1.0, specifying the same partition key twice silently kept the last value. V2.0 returns an error.

V1.0 — duplicate partition value (fails in V2.0):

insert overwrite table t partition (ds = '1', ds = '2') select ...;
-- In V1.0, ds = '1' was silently ignored

V2.0:

insert overwrite table t partition (ds = '2') select ...;

V1.0 — column appears in both table definition and partition (fails in V2.0):

create table t (a bigint, ds string) partitioned by (ds string);

V2.0:

create table t (a bigint) partitioned by (ds string);

ctas.if.not.exists

In V1.0, if the destination table already existed, CREATE TABLE IF NOT EXISTS ... AS SELECT skipped syntax validation. V2.0 always validates the query, so previously hidden errors now surface.

V1.0 (fails in V2.0):

create table if not exists table_name
as
select * from not_exist_table;

Error:

FAILED: ODPS-0130131:[1,50] Table not found - table meta_dev.not_exist_table cannot be resolved

Fix the query by referencing an existing table.

dynamic.pt.to.static

V2.0's optimizer may convert a dynamic partition to a static partition at plan time. If the inferred partition value is invalid (for example, an unresolved variable like '${bizdate}'), V2.0 returns an error during syntax validation.

Example conversion:

-- Input
insert overwrite table srcpt partition(pt) select id, 'pt1' from table_name;

-- V2.0 converts this to:
insert overwrite table srcpt partition(pt='pt1') select id from table_name;

V1.0 (fails in V2.0 when partition value is invalid):

insert overwrite table srcpt partition(pt) select id, '${bizdate}' from table_name limit 0;

Error:

FAILED: ODPS-0130071:[1,24] Semantic analysis exception - wrong columns count 2 in data source, requires 3 columns (includes dynamic partitions if any)

In V1.0, LIMIT 0 returned no rows, so no partition was created and no error was returned. In V2.0, the optimizer evaluates the statement before execution, which triggers the error.

For more information, see the Partition topic in MaxCompute documentation.

Type conversions

bad.escape

In string literals, each ASCII character (0–127) must be written as a backslash followed by exactly three octal digits (for example, \001 for 0, \002 for 1). V1.0 silently accepted non-standard sequences such as \01 and \0001, treating both as \001. V2.0 rejects them.

V1.0 (fails in V2.0):

select split(key, "\01"), value like "\0001" from t;

Error:

FAILED: ODPS-0130161:[1,19] Parse exception - unexpected escape sequence: 01
ODPS-0130161:[1,38] Parse exception - unexpected escape sequence: 0001

V2.0:

select split(key, "\001"), value like "\001" from t;

Appending digits to \000 (such as \0001 through \0009, or \00001) also triggers this error.

string.join.double

When joining STRING and DOUBLE columns, V1.0 converted both to BIGINT, which caused precision loss (for example, 1.1 = "1" evaluated as equal). V2.0, following Apache Hive compatibility, converts both to DOUBLE and emits a warning.

V1.0 (triggers warning in V2.0):

select * from t1 join t2 on t1.double_value = t2.string_value;

Warning

WARNING:[1,48]  implicit conversion from STRING to DOUBLE, potential data loss, use CAST function to suppress

V2.0:

select * from t1 join t2 on t1.double_value = cast(t2.string_value as double);

pt.implicit.convertion.failed

When filtering a STRING partition column with INT constants, V1.0 silently converted the constants to DOUBLE for comparison, which caused all partition values to be filtered out. V2.0 returns an error.

Assume the following table:

create table srcpt(key STRING, value STRING) partitioned by (pt STRING);
alter table srcpt add partition (pt='pt1');
alter table srcpt add partition (pt='pt2');

V1.0 (fails in V2.0):

select key from srcpt where pt in (1, 2);

Error:

FAILED: ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed:
java.lang.NumberFormatException: ODPS-0123091:Illegal type cast -
In function cast, value 'pt1' cannot be casted from String to Double.

V2.0:

select key from srcpt where pt in ('1', '2');

Do not compare STRING partition columns with INT constants. Cast the INT constants to STRING instead.

Query engine

window.ref.prev.window.alias

A window function cannot reference the alias of another window function in the same SELECT clause. V1.0 allowed this in some cases; V2.0 returns an error.

V1.0 (fails in V2.0):

select row_number() over (partition by c1 order by c1) rn,
       row_number() over (partition by c1 order by rn) rn2
from t1;
-- rn does not exist in t1

Error:

FAILED: ODPS-0130071:[2,45] Semantic analysis exception - column rn cannot be resolved

V2.0:

select row_number() over (partition by c1 order by rn) rn2
from (
  select c1, row_number() over (partition by c1 order by c1) rn
  from t1
) tmp;

select.invalid.token.after.star

SELECT * cannot be followed by an alias, even when * expands to a single column.

V1.0 (fails in V2.0):

select * as alias from table_test;

Error:

FAILED: ODPS-0130161:[1,10] Parse exception - invalid token 'as'

V2.0:

select * from table_test;

agg.having.ref.select.alias

The HAVING clause is evaluated before the SELECT clause. Column aliases defined in SELECT are not available in HAVING.

V1.0 (fails in V2.0):

select count(c1) cnt,
       sum(c1) / cnt avg
from t1
group by c2
having cnt > 1;
-- cnt is a SELECT alias; it does not exist in source table t1

Error:

FAILED: ODPS-0130071:[2,11] Semantic analysis exception - column cnt cannot be resolved
ODPS-0130071:[2,11] Semantic analysis exception - column reference cnt should appear in GROUP BY key

V2.0:

select cnt, s, s/cnt avg
from (
  select count(c1) cnt,
         sum(c1) s
  from t1
  group by c2
  having count(c1) > 1
) tmp;

having.use.select.alias

Per SQL specification, GROUP BY and HAVING are evaluated before SELECT, so SELECT-clause aliases are not visible in HAVING.

V1.0 (fails in V2.0):

select id id2 from table_name group by id having id2 > 0;

Error:

FAILED: ODPS-0130071:[1,44] Semantic analysis exception - column id2 cannot be resolved
ODPS-0130071:[1,44] Semantic analysis exception - column reference id2 should appear in GROUP BY key

id2 is a SELECT-clause alias and cannot be used in HAVING.

V2.0:

select id id2 from table_name group by id having id > 0;

order.by.no.limit

In MaxCompute, ORDER BY sorts all rows in the result set, which is expensive for large datasets. V2.0 requires every ORDER BY clause to be accompanied by a LIMIT clause.

V1.0 (fails in V2.0):

select * from (
  select *
  from (
    select cast(login_user_cnt as int) as uv, '3' as shuzi
    from test_login_cnt where type = 'device' and type_name = 'mobile'
  ) v
  order by v.uv desc   -- missing LIMIT here
) v
order by v.shuzi limit 20;

Error:

FAILED: ODPS-0130071:[4,1] Semantic analysis exception - ORDER BY must be used with a LIMIT clause

V2.0: Add a LIMIT clause to the subquery:

select * from (
  select *
  from (
    select cast(login_user_cnt as int) as uv, '3' as shuzi
    from test_login_cnt where type = 'device' and type_name = 'mobile'
  ) v
  order by v.uv desc limit 1000   -- example value; replace with a suitable limit for your data
) v
order by v.shuzi limit 20;

V1.0 did not enforce this rule for views created in projects where odps.sql.validate.orderby.limit=false. For example, this view creation succeeded in V1.0:

create view table_view as select id from table_view order by id;

Querying that view in V2.0 returns:

FAILED: ODPS-0130071:[1,15] Semantic analysis exception - while resolving view xdj.xdj_view_limit - ORDER BY must be used with a LIMIT clause

Add a LIMIT clause to the ORDER BY inside the view definition.

generated.column.name.multi.window

V1.0 automatically generated aliases for expressions in SELECT statements. These aliases were unstable and could change across MaxCompute versions or during upgrades and rollbacks. V2.0 warns against relying on auto-generated aliases.

V1.0 (not recommended):

select _c0 from (select count(*) from table_name) t;
-- _c0 is an auto-generated alias that may change

V2.0:

select c from (select count(*) c from table_name) t;

Always specify explicit aliases for computed columns.

non.boolean.filter

MaxCompute does not support implicit conversions between BOOLEAN and other types. V1.0 allowed BIGINT values as filter conditions in some cases; V2.0 does not.

V1.0 (fails in V2.0):

select id, count(*) from table_name group by id having id;

Error:

FAILED: ODPS-0130071:[1,50] Semantic analysis exception - expect a BOOLEAN expression

V2.0:

select id, count(*) from table_name group by id having id <> 0;

post.select.ambiguous

When ORDER BY, CLUSTER BY, DISTRIBUTE BY, or SORT BY references a name that appears both as a source column and as a SELECT-clause alias, V1.0 silently selected the last column in the SELECT list. V2.0 returns an error.

V1.0 (fails in V2.0):

select a, b as a from t order by a limit 10;
-- a is ambiguous: it could be t.a or the alias for b

Error:

FAILED: ODPS-0130071:[1,34] Semantic analysis exception - a is ambiguous, can be both t.a or null.a

V2.0:

select a as c, b as a from t order by a limit 10;

This error also applies when the conflicting names happen to refer to the same column. Update any statements with conflicting column names to use unambiguous aliases.

order.by.col.ambiguous

When ORDER BY references a name that appears as a duplicate alias in the SELECT clause, V2.0 returns an error.

V1.0 (fails in V2.0):

select id, id
from table_test
order by id;

V2.0:

select id, id id2
from table_name
order by id;

Remove the duplicate alias before using ORDER BY.

in.subquery.without.result

If a column referenced in an IN subquery does not exist in the source table, V2.0 returns a column-resolution error even if the subquery returns no rows.

V1.0 (fails in V2.0):

select * from table_name
where not_exist_col in (select id from table_name limit 0);

Error:

FAILED: ODPS-0130071:[2,7] Semantic analysis exception - column not_exist_col cannot be resolved

Fix the column name in the WHERE clause.

lot.not.in.subquery

V2.0 follows standard SQL NULL semantics for IN and NOT IN subqueries:

1 IN (NULL, 1, 2, 3) returns TRUE
1 IN (NULL, 2, 3) returns NULL
NULL IN (NULL, 1, 2, 3) returns NULL

For NOT IN, if the subquery returns any NULL value, the result is NULL (not TRUE) for every non-matching row. V1.0 returned TRUE in that case.

Example:

select * from t where c not in (select accepted from c_list);

If accepted contains NULL values, this query returns different results in V1.0 and V2.0. In V1.0, rows where c does not match any non-NULL value in accepted are returned. In V2.0, the query returns no rows if accepted contains any NULL values.

V2.0 (exclude NULL values explicitly):

select * from t where c not in (select accepted from c_list where accepted is not null);

Check subqueries used with NOT IN to confirm whether NULL values in the subquery result match your intended behavior.

Constant folding

divide.nan.or.overflow

V1.0 did not evaluate constant subexpressions at plan time. For example, in IF(FALSE, 0/0, 1.0), the division was never executed because the condition was false. V2.0 performs constant folding at plan time, evaluating all constant subexpressions—including those in branches that will never execute—and returns an error if any produces NaN or overflow.

V1.0 (fails in V2.0):

select IF(FALSE, 0/0, 1.0) from table_name;

Error (NaN):

FAILED: ODPS-0130071:[1,19] Semantic analysis exception - encounter runtime exception while evaluating function /,
detailed message: DIVIDE func result NaN, two params are 0.000000 and 0.000000

V1.0 (overflow):

select if(false, 1/0, 1.0) from table_name;

Error (overflow):

FAILED: ODPS-0130071:[1,19] Semantic analysis exception - encounter runtime exception while evaluating function /,
detailed message: DIVIDE func result overflow, two params are 1.000000 and 0.000000

V2.0: Remove all /0 expressions and replace them with valid constants.

The same issue applies to CASE WHEN. The optimizer pushes the division into subqueries, where constant folding then evaluates it:

V1.0:

select case when key = 0 then 0 else 1/key end
from (
  select 0 as key from src
  union all
  select key from src
) r;

Error:

FAILED: ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed:
java.lang.ArithmeticException: DIVIDE func result overflow, two params are 1.000000 and 0.000000

V2.0: Move the CASE WHEN logic into a subquery and eliminate the /0 branch:

select c1
from (
  select 0 c1 from src
  union all
  select case when key = 0 then 0 else 1/key end c1 from src
) r;

Memory and UDFs

small.table.exceeds.mem.limit

In V1.0, multi-way join optimization merged multiple JOIN operations sharing the same join key into a single Fuxi task, and MAPJOIN hints were ignored when this optimization applied.

V2.0 respects MAPJOIN hints and applies them first. If the hinted table is large, the MAPJOIN exceeds the 640 MB memory limit and fails.

V1.0 (fails in V2.0):

select /* +mapjoin(t1) */ t1.*
from t1 join t2 on t1.c1 = t2.c1
join t3 on t1.c1 = t3.c1;

Error:

FAILED: ODPS-0010000:System internal error - SQL Runtime Internal Error:
Hash Join Cursor HashJoin_REL… small table exceeds, memory limit(MB) 640,
fixed memory used …, variable memory used …

V2.0: Remove the MAPJOIN hint and let V2.0 use multi-way join optimization:

select t1.*
from t1 join t2 on t1.c1 = t2.c1
join t3 on t1.c1 = t3.c1;

sigkill.oom

This issue has the same root cause as small.table.exceeds.mem.limit. When MAPJOIN hints are present and the small tables are large, V1.0 silently fell back to multi-way join optimization. In V2.0, the MAPJOIN is applied, and the worker process is terminated when it exceeds its memory limit—even if odps.sql.mapjoin.memory.max is set.

Error:

Fuxi job failed - WorkerRestart errCode:9,errMsg:SigKill(OOM), usually caused by OOM(out of memory).

V2.0: Remove MAPJOIN hints and rely on multi-way join optimization.

wm_concat.first.argument.const

The WM_CONCAT function signature requires its first argument (the separator) to be a STRING constant:

string wm_concat(string separator, string str)

In V1.0, this constraint was not enforced if the source table was empty. V2.0 validates all arguments at plan time, so a non-constant first argument always fails.

V1.0 (fails in V2.0):

select wm_concat(value, ',') FROM src group by value;
-- value is a column reference, not a constant

Error:

FAILED: ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed:
com.aliyun.odps.lot.cbo.validator.AggregateCallValidator$AggregateCallValidationException:
Invalid argument type - The first argument of WM_CONCAT must be constant string.

V2.0:

select wm_concat(',', value) FROM src group by value;

For more information, see the Aggregate functions topic in MaxCompute SQL Reference.

worker.restart.instance.timeout

In V1.0, each record a User-Defined Function (UDF) produced triggered a write to Apsara Distributed File System and sent a heartbeat to Job Scheduler. If a UDF produced no records for 10 minutes, the job timed out.

V2.0 uses vectorized execution, processing multiple rows of a column at a time. When a batch of rows takes longer than 10 minutes to produce output, the heartbeat timeout triggers even if the UDF is functioning correctly.

Error:

FAILED: ODPS-0123144: Fuxi job failed - WorkerRestart errCode:252,errMsg:kInstanceMonitorTimeout,
usually caused by bad udf performance.

First, profile the UDF. If processing each record takes several seconds, optimize the UDF logic.

If the UDF cannot be optimized further, reduce the batch size. The default batch size is 1,024 rows:

set odps.sql.executionengine.batch.rowcount=16;

A smaller batch size means each batch completes faster and sends heartbeats more frequently, preventing the timeout.