A UNION ALL clause is used to combine two data streams. The field types and sequences of the two data streams must be the same.
Syntax
select_statement
UNION ALL
select_statement;
Note Realtime Compute for Apache Flink also supports the
UNION
function. UNION ALL
allows duplicate values and UNION
does not allow duplicate values. In Realtime Compute for Apache Flink, UNION
is equivalent to the combination of UNION ALL and DISTINCT
. We recommend that you do not use UNION
because its operating efficiency is low.
Example
- Test data
Table 1. test_source_union1 a (VARCHAR) b (BIGINT) c (BIGINT) test1 1 10 Table 2. test_source_union2 a (VARCHAR) b (BIGINT) c (BIGINT) test1 1 10 test2 2 20 Table 3. test_source_union3 a (VARCHAR) b (BIGINT) c (BIGINT) test1 1 10 test2 2 20 test1 1 10 - Sample code
SELECT a, sum(b) as d, sum(c) as e FROM (SELECT * from test_source_union1 UNION ALL SELECT * from test_source_union2 UNION ALL SELECT * from test_source_union3 )t GROUP BY a;
- Test results
a (VARCHAR) d (BIGINT) e (BIGINT) test1 1 10 test2 2 20 test1 2 20 test1 3 30 test2 4 40 test1 4 40 Note The preceding test results are debugging results. In these results, you can view the computing process. If your job is published and the result table is stored in DataHub, Alibaba Cloud Message Queue for Apache Kafka, or Alibaba Cloud Message Queue, the result data contains data about the computing process. If your job is published and the result table is stored in a relational database such as ApsaraDB RDS, the records that have the same primary key values are combined into one record.