All Products
Search
Document Center

Data Lake Analytics - Deprecated:SELECT

Last Updated:Apr 23, 2019

Syntax

  1. [ WITH with_query [, ...] ]
  2. SELECT [ ALL | DISTINCT ] select_expr [, ...]
  3. [ FROM from_item [, ...] ]
  4. [ WHERE condition ]
  5. [ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
  6. [ HAVING condition]
  7. [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
  8. [ ORDER BY expression [ ASC | DESC ] [, ...] ]
  9. [ LIMIT [ count | ALL ] ]

Parameters

  • from_item: DLA supports the following two modes:

    1. table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
    1. from_item join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]
  • join_type: indicates how tables are connected.

    1. [ INNER ] JOIN
    2. LEFT [ OUTER ] JOIN
    3. RIGHT [ OUTER ] JOIN
    4. FULL [ OUTER ] JOIN
  • grouping_element

    1. ()
    2. expression

WITH clause

The WITH clause defines a named relationship for use in a query statement. It flattens nested queries or simplifies subqueries. For example, the following queries are equivalent:

  1. SELECT a, b
  2. FROM (
  3. SELECT a, MAX(b) AS b FROM t GROUP BY a
  4. ) AS x;
  5. WITH x AS (SELECT a, MAX(b) AS b FROM t GROUP BY a)
  6. SELECT a, b FROM x;

This clause is also applicable to multiple subqueries:

  1. WITH
  2. t1 AS (SELECT a, MAX(b) AS b FROM x GROUP BY a),
  3. t2 AS (SELECT a, AVG(d) AS d FROM y GROUP BY a)
  4. SELECT t1.*, t2.*
  5. FROM t1
  6. JOIN t2 ON t1.a = t2.a;

Additionally, the relationships defined in a WITH clause can be connected to each other.

  1. WITH
  2. x AS (SELECT a FROM t),
  3. y AS (SELECT a AS b FROM x),
  4. z AS (SELECT b AS c FROM y)
  5. SELECT c FROM z;

GROUP BY clause

The GROUP BY clause groups the output data of the SELECT statement. A group contains data rows of the matching values. A simple GROUP BY clause contains any expressions or column numbers (starting from 1) consisting of the input columns.

The following queries are equivalent. Both of them group the nationkey column. The first query uses the column number, while the second query uses the column name.

  1. SELECT count(*), nationkey FROM customer GROUP BY 2;
  2. SELECT count(*), nationkey FROM customer GROUP BY nationkey;

If the column name is not specified in a query statement, the GROUP BY clause can still group the output data. For example, the following query uses the mktsegment column for grouping and counts the number of rows in the customer table.:

  1. SELECT count(*) FROM customer GROUP BY mktsegment;
  1. _col0
  2. -------
  3. 29968
  4. 30142
  5. 30189
  6. 29949
  7. 29752
  8. (5 rows)

When a GROUP BY clause is used in a SELECT statement, all output columns must be aggregate functions or columns in the GROUP BY clause.

HAVING clause

The HAVING clause is used together with the aggregate functions and GROUP BY clause to control and select groups. The HAVING clause removes groups that do not meet the conditions. After the grouping and aggregate computing operations are complete, the HAVING clause filters the groups. The following example shows how to query a customer table, groups the table, and queries the records with the account balance greater than a specified value.

  1. SELECT count(*), mktsegment, nationkey,
  2. CAST(sum(acctbal) AS bigint) AS totalbal
  3. FROM customer
  4. GROUP BY mktsegment, nationkey
  5. HAVING sum(acctbal) > 5700000
  6. ORDER BY totalbal DESC;
  1. _col0 | mktsegment | nationkey | totalbal
  2. -------+------------+-----------+----------
  3. 1272 | AUTOMOBILE | 19 | 5856939
  4. 1253 | FURNITURE | 14 | 5794887
  5. 1248 | FURNITURE | 9 | 5784628
  6. 1243 | FURNITURE | 12 | 5757371
  7. 1231 | HOUSEHOLD | 3 | 5753216
  8. 1251 | MACHINERY | 2 | 5719140
  9. 1247 | FURNITURE | 8 | 5701952
  10. (7 rows)

UNION, INTERSECT, and EXCEPT clauses

UNION, INTERSECT, and EXCEPT are full set operators. These clauses are used to combine the results of multiple query statements into a final result.

query UNION [ALL | DISTINCT] query

query INTERSECT query

query EXCEPT query

The parameter ALL or DISTINCT controls the rows that you want to include in the final result set. If ALL is specified, all rows are included, even some rows are exactly the same. If DISTINCT is specified, the result set contains only unrepeated rows with repeated rows merged. If no parameter is specified, DISTINCT is used by default.

Multiple set operators are processed from left to right, unless the sequence is specified using parentheses. In addition, INTERSECT has a higher priority than EXCEPT and UNION, which means that the expression A UNION B INTERSECT C EXCEPT D is the same as A UNION (B INTERSECT C) EXCEPT D.

UNION

UNION combines two result sets. The following example shows the simplest way to use the UNION clause. The UNION clause selects the value 13 and merges it with the second selected value 42.

  1. SELECT 13
  2. UNION
  3. SELECT 42;
  1. _col0
  2. -------
  3. 13
  4. 42
  5. (2 rows)

INTERSECT

INTERSECT returns only the combinations of rows that appear in both the first and second queries. The following example shows the simplest way to use the INTERSECT clause. The INTERSECT clause selects the values 13 and 42, and merges them with the value 13 that is selected from the secondary query. As the value 42 is included in the result set of the first query, it is not included in the final result set.

  1. SELECT 13
  2. INTERSECT
  3. SELECT 13;
  1. _col0
  2. -------
  3. 13
  4. (2 rows)

EXCEPT

EXCEPT returns the rows that appear only in the result set of the first query and do not appear in the result set of the second query. The following example shows the simplest way to use the EXCEPT clause. The clause selects the value 42, and merges it with the value 13 that is selected from the second query. As the value 13 is included in the result set of the second query, it is not included in the final result set.

  1. SELECT 42
  2. EXCEPT
  3. SELECT 13;
  1. _col0
  2. -------
  3. 42
  4. (2 rows)

ORDER BY Clause

The ORDER BY clause is used to sort a result set by one or more output expressions.

ORDER BY expression [ ASC | DESC ] [, ...]

Each expression consists of the column names or numbers (starting from 1). Serving as the last step of a query, the ORDER BY clause is used after the GROUP BY and HAVING clauses.

LIMIT Clause

The LIMIT clause limits the number of rows in the final result set. The result of the LIMIT ALL clause is the same as that when the LIMIT clause is omitted. In the following example, a large table is queried, and the LIMIT clause restricts output to only five rows. (Rows are randomly returned because the ORDER BY clause is not used in the query.)

SELECT orderdate FROM orders LIMIT 5;

  1. o_orderdate
  2. -------------
  3. 1996-04-14
  4. 1992-01-15
  5. 1995-02-01
  6. 1995-11-12
  7. 1992-04-26
  8. (5 rows)

Joins

This allows you to merge data from multiple Join clauses.

CROSS JOIN

  1. SELECT *
  2. FROM nation, region;

The tables nation and region contain 25 rows and 5 rows, respectively. Therefore, the two tables ultimately generate 125 rows.

Subqueries

A subquery is an expression that contains queries. A subquery column is joint to a peripheral query column. Logically, a subquery is evaluated on a row basis by its peripheral queries. Therefore, the referenced columns are fixed to the evaluation process of the subquery.

Note

Joined subqueries are limited. Not every type of the subqueries is supported.

EXISTS

The EXISTS assertion determines whether a subquery can return any row.

  1. SELECT name
  2. FROM nation
  3. WHERE EXISTS (SELECT * FROM region WHERE region.regionkey = nation.regionkey)

IN

The IN assertion determines whether the value returned by a subquery is included in a given result set. The result of IN follows the standard result of nulls. A subquery can generate only one column:

  1. SELECT name
  2. FROM nation
  3. WHERE regionkey IN (SELECT regionkey FROM region)

Scalar subqueries

A scalar subquery is a non-joined subquery that returns none or one data row. If the subquery returns more than one data row, an error is returned. If the subquery does not return any rows, NULL is returned.

  1. SELECT name
  2. FROM nation
  3. WHERE regionkey = (SELECT max(regionkey) FROM region)

The scalar subquery can be used for a single column.