By Mujiao
Recently, I participated a private cloud project that needed to be upgraded. To do that, MaxCompute needed to be upgraded to a newer version, which had higher requirements for SQL writing methods, resulting in the problem of implicit UNION conversion. The exception scanned was: union.string.meet.non.string
.
In some ODPS modes, if the corresponding columns on both sides of a UNION are of different types, implicit conversion will be introduced. If one side is a string and the other side is a number or datetime type, the string is converted to the other type. However, most databases or open-source ecosystems do not follow this rule, such as Hive, MySQL, etc. will be converted into a string first. Such uncertain conversion rules can be dangerous sometimes. For example, when users migrate from hive to ODPS, there may be silent precision loss and semantic errors. ODPS2.0 prohibits this implicit type conversion for security (this is also the default setting of oracle). If you need this conversion, please use the CAST function. (It was fine before, but now it will be reported wrong.) Now, the project team requires the script author to check his script, specify the type to be converted, and add explicit conversion if necessary.
For example:
select * from (-- (error)
select a_bigint c1 from t1
union all
select a_string c1 from t2) x;
-- If you want the result c1 to be of the bigint type (this is the current ODPS behavior), change it to
select * from (--(correct)
select a_bigint c1 from t1
union all
select cast(a_string as bigint) c1 from t2) x;
-- If you want the result c1 to be of the string type (this is the current HIVE behavior), change it to
select * from (--(correct)
select cast(a_bigint as string) c1 from t1
union all
select a_string c1 from t2) x;
Since it has not been upgraded, the script will not report errors at present, and we cannot catch MaxCompute exceptions. Sadly, we can only perform the modification with our naked eyes.
Incorrect example:
select 123 as aa,0 as ab
from xlog
union ALL
select getdate() as aa,0 as ab
from xlog;
FAILED: ODPS-0130241:[4,8] Illegalunion operation - type mismatch for column 0 of UNION, left is BIGINT while right is DATETIME
-- Note: [4,8] here refers to the fourth line, and the eighth character, which starts with getdate().
How can you quickly locate the field? I looked through hundreds of scripts retrieved backstage. The script code is mostly between 500 and 1000 lines. The number of UNION is as small as 3-5 in a single script and can be more than 20. I spent a whole morning but achieved nothing.
I thought about it briefly. If I wanted to know whether the data types of the two tables of UNION are aligned, I had to look at the data types in the original table structure, which are the data types of the target table structure. Also, I had to look at the code to find the data types after SQL logic execution to find which field data types are inconsistent.
Following this idea, I found that the first script had more than 1000 lines of code. UNION had more than 100 table fields, and the UNION had six more. They were completely unrecognizable by naked eyes. I spent the whole morning doing it, but the problem remained unsolved. I was even more confused.
When there was no way out, I suddenly thought of the execution plan. Will the execution plan of MaxCompute display the output data type? Yes.
explain
select 123 as aa,0 as ab
from xlog
;
Job Queueing...
job0 is root job
In Job job0:
root Tasks: M1
In Task M1:
Data source: mujiao.xlog
TS: mujiao.xlog
SEL: 123L aa, 0L ab
FS: output: Screen
schema:
aa (bigint)
ab (bigint)
OK
explain
select getdate() as aa,0 as ab
from xlog;
;
Job Queueing...
job0 is root job
In Job job0:
root Tasks: M1
In Task M1:
Data source: mujiao.xlog
TS: mujiao.xlog
SEL: 1655965081824 aa, 0L ab
FS: output: Screen
schema:
aa (datetime)
ab (bigint)
OK
It can be seen that under FS:output:Screen is schema:aa(bigint),ab(bigint). This is the data type we can use. We can explain the UNION section by section in the long script and then intercept this part to compare the differences between multiple schemas.
schema1: schema2:
aa (bigint) aa (datetime)
ab (bigint) ab (bigint)
Then, I can use my naked eyes to find that the aa fields of the two SQL segments in UNION are different.
Other related issues:
1) The max_pt() function in the execution plan cannot be used in the development environment since the development environment does not have partitions. This function will report an error. Either delete and annotate this function or add the production environment prefix in front of the table.
2) For ultra-long SQL segments, the execution plan may have hundreds (or thousands) of lines, and the final output cannot be found. Search for "output: Screen" in the log, which corresponds to the final output.
3) When there are too many fields, and we fail to find out which types are different, it is recommended to compare them in excel to filter and compare data types one by one.
4) The execution plan may not come out under special circumstances. Use the create table to create a temporary table to identify the data type of SQL output and the desc table structure. However, each field must have a name. When creating the table, null also requires a clear data type after cast.
5) Date Conversion: The format type of string to date conversion cannot be guessed, so it is recommended to look at the data format and not guess. Otherwise, the error can only be reported after the online operation.
6) For Null values, you can cast(null as datetime) and cast(null as double) to assign values to fields.
Even if all of this is possible, for hundreds of scripts and hundreds of lines, this work is enough to make you impatient. It is suggested that R&D should combine work and rest and then turn this job into a habit in the future. If there is a large section of SQL UNION, explain it first and don't wait until errors are reported.
In the end, you will find that the reason for all this is that our work has not been done well before. Since it is a UNION data field, the theoretical data type and value range should be the same. How can this problem occur? Cases like a value being stored as a character, and a date being stored as a character should never happen. The errors now come from the work before. If we go one step at a time, the follow-up work will be easier.
Later, I got a prompt for warning information to be displayed from R&D.
setodps.compiler.warning.disable=false;
sql running .....
WARNING:[4,8] implicit conversion from bigint to datetime,use cast function to suppress
This warning will make all implicit conversions come out. In the on-site environment, it is much more than what I judged according to the method using explain
. How to use these two methods in actual use is left to your judgment.
137 posts | 19 followers
FollowChao - June 26, 2023
Apache Flink Community China - September 16, 2020
Apache Flink Community - June 28, 2024
amap_tech - March 16, 2021
dehong - July 8, 2020
Alibaba Clouder - April 30, 2020
137 posts | 19 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreVisualization, O&M-free orchestration, and Coordination of Stateful Application Scenarios
Learn MoreMore Posts by Alibaba Cloud MaxCompute