MaxCompute V2.0 enforces stricter SQL syntax rules to align with open-source ecosystems and improve performance. Some statements that ran without errors in V1.0 now fail or produce warnings in V2.0.
This topic covers each incompatibility category, describes the error or behavior change, and shows how to rewrite the affected statements.
Before you begin
When MaxCompute V2.0 fails to run a job, the framework automatically falls back to V1.0. This rollback increases job latency. To prevent the rollback from masking issues during migration, disable it before submitting jobs:
set odps.sql.planner.mode=lot;The MaxCompute team notifies job owners by email or DingTalk when rollbacks are detected. Fix the affected statements as soon as possible to avoid job failures after rollback support is removed.
Quick reference
| Category | Description |
|---|---|
| group.by.with.star | SELECT * with GROUP BY requires all columns in the GROUP BY clause |
| bad.escape | Escape sequences must use exactly three octal digits |
| column.repeated.in.creation | Duplicate column names in CREATE TABLE |
| duplicated.partition.column | Duplicate partition key names in a query |
| string.join.double | Implicit STRING-to-DOUBLE conversion in JOIN conditions |
| window.ref.prev.window.alias | Window functions cannot reference aliases from sibling window functions |
| select.invalid.token.after.star | SELECT * cannot have an alias |
| agg.having.ref.prev.agg.alias | HAVING cannot use SELECT-clause aliases |
| order.by.no.limit | ORDER BY requires a LIMIT clause |
| generated.column.name.multi.window | Auto-generated column aliases may change between versions |
| non.boolean.filter | Non-BOOLEAN expressions in WHERE/HAVING filters |
| post.select.ambiguous | Ambiguous column references in ORDER BY / CLUSTER BY / DISTRIBUTE BY / SORT BY |
| order.by.col.ambiguous | ORDER BY references duplicate aliases in SELECT |
| in.subquery.without.result | Column in IN subquery does not exist in the source table |
| ctas.if.not.exists | Syntax errors in CTAS destination table are now validated |
| dynamic.pt.to.static | Dynamic partitions may be converted to static partitions by the optimizer |
| lot.not.in.subquery | NULL handling in NOT IN subqueries |
| having.use.select.alias | HAVING cannot use column aliases defined in SELECT |
| divide.nan.or.overflow | Division constant folding causes errors for /0 at plan time |
| small.table.exceeds.mem.limit | MAPJOIN hints override multi-way join optimization |
| sigkill.oom | Large MAPJOIN small tables cause worker out-of-memory errors |
| wm_concat.first.argument.const | WM_CONCAT first argument must be a constant |
| pt.implicit.convertion.failed | Implicit type conversion fails on partition key comparisons |
| worker.restart.instance.timeout | UDF vectorized execution causes heartbeat timeouts |
DDL statements
group.by.with.star
In V1.0, SELECT * FROM t GROUP BY key was valid even when not all columns were listed in the GROUP BY clause. In V2.0, every column in the source table must appear in the GROUP BY clause.
V1.0 (fails in V2.0):
select * from t group by key;Error:
FAILED: ODPS-0130071:[1,8] Semantic analysis exception - column reference t.value should appear in GROUP BY keyV2.0:
select distinct key from t;If the GROUP BY clause does include all columns, the statement does not error in V2.0, but the SELECT DISTINCT form is clearer:
-- Not recommended (no error, but intent is unclear)
select * from t group by key, value; -- t has columns key and value
-- Recommended
select distinct key, value from t;column.repeated.in.creation
V2.0 returns an error when a CREATE TABLE statement defines the same column name more than once. V1.0 allowed this.
V1.0 (fails in V2.0):
create table t (a BIGINT, b BIGINT, a BIGINT);Error:
FAILED: ODPS-0130071:[1,37] Semantic analysis exception - column repeated in creation: aV2.0:
create table t (a BIGINT, b BIGINT);duplicated.partition.column
In V1.0, specifying the same partition key twice silently kept the last value. V2.0 returns an error.
V1.0 — duplicate partition value (fails in V2.0):
insert overwrite table t partition (ds = '1', ds = '2') select ...;
-- In V1.0, ds = '1' was silently ignoredV2.0:
insert overwrite table t partition (ds = '2') select ...;V1.0 — column appears in both table definition and partition (fails in V2.0):
create table t (a bigint, ds string) partitioned by (ds string);V2.0:
create table t (a bigint) partitioned by (ds string);ctas.if.not.exists
In V1.0, if the destination table already existed, CREATE TABLE IF NOT EXISTS ... AS SELECT skipped syntax validation. V2.0 always validates the query, so previously hidden errors now surface.
V1.0 (fails in V2.0):
create table if not exists table_name
as
select * from not_exist_table;Error:
FAILED: ODPS-0130131:[1,50] Table not found - table meta_dev.not_exist_table cannot be resolvedFix the query by referencing an existing table.
dynamic.pt.to.static
V2.0's optimizer may convert a dynamic partition to a static partition at plan time. If the inferred partition value is invalid (for example, an unresolved variable like '${bizdate}'), V2.0 returns an error during syntax validation.
Example conversion:
-- Input
insert overwrite table srcpt partition(pt) select id, 'pt1' from table_name;
-- V2.0 converts this to:
insert overwrite table srcpt partition(pt='pt1') select id from table_name;V1.0 (fails in V2.0 when partition value is invalid):
insert overwrite table srcpt partition(pt) select id, '${bizdate}' from table_name limit 0;Error:
FAILED: ODPS-0130071:[1,24] Semantic analysis exception - wrong columns count 2 in data source, requires 3 columns (includes dynamic partitions if any)In V1.0, LIMIT 0 returned no rows, so no partition was created and no error was returned. In V2.0, the optimizer evaluates the statement before execution, which triggers the error.
For more information, see the Partition topic in MaxCompute documentation.
Type conversions
bad.escape
In string literals, each ASCII character (0–127) must be written as a backslash followed by exactly three octal digits (for example, \001 for 0, \002 for 1). V1.0 silently accepted non-standard sequences such as \01 and \0001, treating both as \001. V2.0 rejects them.
V1.0 (fails in V2.0):
select split(key, "\01"), value like "\0001" from t;Error:
FAILED: ODPS-0130161:[1,19] Parse exception - unexpected escape sequence: 01
ODPS-0130161:[1,38] Parse exception - unexpected escape sequence: 0001V2.0:
select split(key, "\001"), value like "\001" from t;Appending digits to\000(such as\0001through\0009, or\00001) also triggers this error.
string.join.double
When joining STRING and DOUBLE columns, V1.0 converted both to BIGINT, which caused precision loss (for example, 1.1 = "1" evaluated as equal). V2.0, following Apache Hive compatibility, converts both to DOUBLE and emits a warning.
V1.0 (triggers warning in V2.0):
select * from t1 join t2 on t1.double_value = t2.string_value;WARNING:[1,48] implicit conversion from STRING to DOUBLE, potential data loss, use CAST function to suppressV2.0:
select * from t1 join t2 on t1.double_value = cast(t2.string_value as double);pt.implicit.convertion.failed
When filtering a STRING partition column with INT constants, V1.0 silently converted the constants to DOUBLE for comparison, which caused all partition values to be filtered out. V2.0 returns an error.
Assume the following table:
create table srcpt(key STRING, value STRING) partitioned by (pt STRING);
alter table srcpt add partition (pt='pt1');
alter table srcpt add partition (pt='pt2');V1.0 (fails in V2.0):
select key from srcpt where pt in (1, 2);Error:
FAILED: ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed:
java.lang.NumberFormatException: ODPS-0123091:Illegal type cast -
In function cast, value 'pt1' cannot be casted from String to Double.V2.0:
select key from srcpt where pt in ('1', '2');Do not compare STRING partition columns with INT constants. Cast the INT constants to STRING instead.
Query engine
window.ref.prev.window.alias
A window function cannot reference the alias of another window function in the same SELECT clause. V1.0 allowed this in some cases; V2.0 returns an error.
V1.0 (fails in V2.0):
select row_number() over (partition by c1 order by c1) rn,
row_number() over (partition by c1 order by rn) rn2
from t1;
-- rn does not exist in t1Error:
FAILED: ODPS-0130071:[2,45] Semantic analysis exception - column rn cannot be resolvedV2.0:
select row_number() over (partition by c1 order by rn) rn2
from (
select c1, row_number() over (partition by c1 order by c1) rn
from t1
) tmp;select.invalid.token.after.star
SELECT * cannot be followed by an alias, even when * expands to a single column.
V1.0 (fails in V2.0):
select * as alias from table_test;Error:
FAILED: ODPS-0130161:[1,10] Parse exception - invalid token 'as'V2.0:
select * from table_test;agg.having.ref.select.alias
The HAVING clause is evaluated before the SELECT clause. Column aliases defined in SELECT are not available in HAVING.
V1.0 (fails in V2.0):
select count(c1) cnt,
sum(c1) / cnt avg
from t1
group by c2
having cnt > 1;
-- cnt is a SELECT alias; it does not exist in source table t1Error:
FAILED: ODPS-0130071:[2,11] Semantic analysis exception - column cnt cannot be resolved
ODPS-0130071:[2,11] Semantic analysis exception - column reference cnt should appear in GROUP BY keyV2.0:
select cnt, s, s/cnt avg
from (
select count(c1) cnt,
sum(c1) s
from t1
group by c2
having count(c1) > 1
) tmp;having.use.select.alias
Per SQL specification, GROUP BY and HAVING are evaluated before SELECT, so SELECT-clause aliases are not visible in HAVING.
V1.0 (fails in V2.0):
select id id2 from table_name group by id having id2 > 0;Error:
FAILED: ODPS-0130071:[1,44] Semantic analysis exception - column id2 cannot be resolved
ODPS-0130071:[1,44] Semantic analysis exception - column reference id2 should appear in GROUP BY keyid2 is a SELECT-clause alias and cannot be used in HAVING.
V2.0:
select id id2 from table_name group by id having id > 0;order.by.no.limit
In MaxCompute, ORDER BY sorts all rows in the result set, which is expensive for large datasets. V2.0 requires every ORDER BY clause to be accompanied by a LIMIT clause.
V1.0 (fails in V2.0):
select * from (
select *
from (
select cast(login_user_cnt as int) as uv, '3' as shuzi
from test_login_cnt where type = 'device' and type_name = 'mobile'
) v
order by v.uv desc -- missing LIMIT here
) v
order by v.shuzi limit 20;Error:
FAILED: ODPS-0130071:[4,1] Semantic analysis exception - ORDER BY must be used with a LIMIT clauseV2.0: Add a LIMIT clause to the subquery:
select * from (
select *
from (
select cast(login_user_cnt as int) as uv, '3' as shuzi
from test_login_cnt where type = 'device' and type_name = 'mobile'
) v
order by v.uv desc limit 1000 -- example value; replace with a suitable limit for your data
) v
order by v.shuzi limit 20;V1.0 did not enforce this rule for views created in projects where odps.sql.validate.orderby.limit=false. For example, this view creation succeeded in V1.0:
create view table_view as select id from table_view order by id;Querying that view in V2.0 returns:
FAILED: ODPS-0130071:[1,15] Semantic analysis exception - while resolving view xdj.xdj_view_limit - ORDER BY must be used with a LIMIT clauseAdd a LIMIT clause to the ORDER BY inside the view definition.
generated.column.name.multi.window
V1.0 automatically generated aliases for expressions in SELECT statements. These aliases were unstable and could change across MaxCompute versions or during upgrades and rollbacks. V2.0 warns against relying on auto-generated aliases.
V1.0 (not recommended):
select _c0 from (select count(*) from table_name) t;
-- _c0 is an auto-generated alias that may changeV2.0:
select c from (select count(*) c from table_name) t;Always specify explicit aliases for computed columns.
non.boolean.filter
MaxCompute does not support implicit conversions between BOOLEAN and other types. V1.0 allowed BIGINT values as filter conditions in some cases; V2.0 does not.
V1.0 (fails in V2.0):
select id, count(*) from table_name group by id having id;Error:
FAILED: ODPS-0130071:[1,50] Semantic analysis exception - expect a BOOLEAN expressionV2.0:
select id, count(*) from table_name group by id having id <> 0;post.select.ambiguous
When ORDER BY, CLUSTER BY, DISTRIBUTE BY, or SORT BY references a name that appears both as a source column and as a SELECT-clause alias, V1.0 silently selected the last column in the SELECT list. V2.0 returns an error.
V1.0 (fails in V2.0):
select a, b as a from t order by a limit 10;
-- a is ambiguous: it could be t.a or the alias for bError:
FAILED: ODPS-0130071:[1,34] Semantic analysis exception - a is ambiguous, can be both t.a or null.aV2.0:
select a as c, b as a from t order by a limit 10;This error also applies when the conflicting names happen to refer to the same column. Update any statements with conflicting column names to use unambiguous aliases.
order.by.col.ambiguous
When ORDER BY references a name that appears as a duplicate alias in the SELECT clause, V2.0 returns an error.
V1.0 (fails in V2.0):
select id, id
from table_test
order by id;V2.0:
select id, id id2
from table_name
order by id;Remove the duplicate alias before using ORDER BY.
in.subquery.without.result
If a column referenced in an IN subquery does not exist in the source table, V2.0 returns a column-resolution error even if the subquery returns no rows.
V1.0 (fails in V2.0):
select * from table_name
where not_exist_col in (select id from table_name limit 0);Error:
FAILED: ODPS-0130071:[2,7] Semantic analysis exception - column not_exist_col cannot be resolvedFix the column name in the WHERE clause.
lot.not.in.subquery
V2.0 follows standard SQL NULL semantics for IN and NOT IN subqueries:
1 IN (NULL, 1, 2, 3)returnsTRUE1 IN (NULL, 2, 3)returnsNULLNULL IN (NULL, 1, 2, 3)returnsNULL
For NOT IN, if the subquery returns any NULL value, the result is NULL (not TRUE) for every non-matching row. V1.0 returned TRUE in that case.
Example:
select * from t where c not in (select accepted from c_list);If accepted contains NULL values, this query returns different results in V1.0 and V2.0. In V1.0, rows where c does not match any non-NULL value in accepted are returned. In V2.0, the query returns no rows if accepted contains any NULL values.
V2.0 (exclude NULL values explicitly):
select * from t where c not in (select accepted from c_list where accepted is not null);Check subqueries used with NOT IN to confirm whether NULL values in the subquery result match your intended behavior.
Constant folding
divide.nan.or.overflow
V1.0 did not evaluate constant subexpressions at plan time. For example, in IF(FALSE, 0/0, 1.0), the division was never executed because the condition was false. V2.0 performs constant folding at plan time, evaluating all constant subexpressions—including those in branches that will never execute—and returns an error if any produces NaN or overflow.
V1.0 (fails in V2.0):
select IF(FALSE, 0/0, 1.0) from table_name;Error (NaN):
FAILED: ODPS-0130071:[1,19] Semantic analysis exception - encounter runtime exception while evaluating function /,
detailed message: DIVIDE func result NaN, two params are 0.000000 and 0.000000V1.0 (overflow):
select if(false, 1/0, 1.0) from table_name;Error (overflow):
FAILED: ODPS-0130071:[1,19] Semantic analysis exception - encounter runtime exception while evaluating function /,
detailed message: DIVIDE func result overflow, two params are 1.000000 and 0.000000V2.0: Remove all /0 expressions and replace them with valid constants.
The same issue applies to CASE WHEN. The optimizer pushes the division into subqueries, where constant folding then evaluates it:
V1.0:
select case when key = 0 then 0 else 1/key end
from (
select 0 as key from src
union all
select key from src
) r;Error:
FAILED: ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed:
java.lang.ArithmeticException: DIVIDE func result overflow, two params are 1.000000 and 0.000000V2.0: Move the CASE WHEN logic into a subquery and eliminate the /0 branch:
select c1
from (
select 0 c1 from src
union all
select case when key = 0 then 0 else 1/key end c1 from src
) r;Memory and UDFs
small.table.exceeds.mem.limit
In V1.0, multi-way join optimization merged multiple JOIN operations sharing the same join key into a single Fuxi task, and MAPJOIN hints were ignored when this optimization applied.
V2.0 respects MAPJOIN hints and applies them first. If the hinted table is large, the MAPJOIN exceeds the 640 MB memory limit and fails.
V1.0 (fails in V2.0):
select /* +mapjoin(t1) */ t1.*
from t1 join t2 on t1.c1 = t2.c1
join t3 on t1.c1 = t3.c1;Error:
FAILED: ODPS-0010000:System internal error - SQL Runtime Internal Error:
Hash Join Cursor HashJoin_REL… small table exceeds, memory limit(MB) 640,
fixed memory used …, variable memory used …V2.0: Remove the MAPJOIN hint and let V2.0 use multi-way join optimization:
select t1.*
from t1 join t2 on t1.c1 = t2.c1
join t3 on t1.c1 = t3.c1;sigkill.oom
This issue has the same root cause as small.table.exceeds.mem.limit. When MAPJOIN hints are present and the small tables are large, V1.0 silently fell back to multi-way join optimization. In V2.0, the MAPJOIN is applied, and the worker process is terminated when it exceeds its memory limit—even if odps.sql.mapjoin.memory.max is set.
Error:
Fuxi job failed - WorkerRestart errCode:9,errMsg:SigKill(OOM), usually caused by OOM(out of memory).V2.0: Remove MAPJOIN hints and rely on multi-way join optimization.
wm_concat.first.argument.const
The WM_CONCAT function signature requires its first argument (the separator) to be a STRING constant:
string wm_concat(string separator, string str)In V1.0, this constraint was not enforced if the source table was empty. V2.0 validates all arguments at plan time, so a non-constant first argument always fails.
V1.0 (fails in V2.0):
select wm_concat(value, ',') FROM src group by value;
-- value is a column reference, not a constantError:
FAILED: ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed:
com.aliyun.odps.lot.cbo.validator.AggregateCallValidator$AggregateCallValidationException:
Invalid argument type - The first argument of WM_CONCAT must be constant string.V2.0:
select wm_concat(',', value) FROM src group by value;For more information, see the Aggregate functions topic in MaxCompute SQL Reference.
worker.restart.instance.timeout
In V1.0, each record a User-Defined Function (UDF) produced triggered a write to Apsara Distributed File System and sent a heartbeat to Job Scheduler. If a UDF produced no records for 10 minutes, the job timed out.
V2.0 uses vectorized execution, processing multiple rows of a column at a time. When a batch of rows takes longer than 10 minutes to produce output, the heartbeat timeout triggers even if the UDF is functioning correctly.
Error:
FAILED: ODPS-0123144: Fuxi job failed - WorkerRestart errCode:252,errMsg:kInstanceMonitorTimeout,
usually caused by bad udf performance.First, profile the UDF. If processing each record takes several seconds, optimize the UDF logic.
If the UDF cannot be optimized further, reduce the batch size. The default batch size is 1,024 rows:
set odps.sql.executionengine.batch.rowcount=16;A smaller batch size means each batch completes faster and sends heartbeats more frequently, preventing the timeout.