Use the SELECT statement to query data in Hologres - Hologres

This topic describes how to use the SELECT statement to query data in Hologres.

Description

SELECT: queries data from zero or more tables in multiple ways. The following table lists the parameters in the SELECT statement.

Parameters
WITH list	SELECT list
FROM clause	WHERE clause
GROUP BY clause	CUBE clause
DISTINCT clause	COUNT DISTINCT clause
UNION clause	INTERSECT clause
EXCEPT clause	ORDER BY clause
LIMIT clause	None

Syntax

The following code shows the basic syntax of the SELECT statement.

[ WITH with_query [, ...] ]
SELECT [ALL | DISTINCT [ON (expression [, ...])]]
  * | expression [[AS] output_name] [, ...]
  [FROM from_item [, ...]]
  [WHERE condition]
  [GROUP BY grouping_element [, ...]]
  [HAVING condition [, ...]]
  [{UNION | INTERSECT | EXCEPT} [ALL] select]
  [ORDER BY expression [ASC | DESC | USING operator] [, ...]]
  [LIMIT {count | ALL}]

Description of grouping_element and from_item:

grouping_element indicates an expression. The expression can be a column name, a constant, a function, a combination of column names or constants, or a combination of functions that are connected by arithmetic operators or bitwise operators.

from_item has the following forms:

table_name [[AS] alias [( column_alias [, ...] )]]
(select) [AS] alias [( column_alias [, ...] )]
from_item [NATURAL] join_type from_item
          [ON join_condition | USING ( join_column [, ...] )]

Usage notes

When you execute the SELECT statement, take note of the following items:

All elements in the FROM list are computed. Each element in the FROM list is a real or virtual table. If more than one element is specified in the FROM list, the union of the results is returned.
If the WHERE clause is specified, all rows that do not meet the condition are eliminated from the output.
If the GROUP BY clause is specified and an aggregate function is used, the output consists of multiple groups of rows that match one or more specified values, and an aggregation result is calculated in each group. If the HAVING clause is specified, groups that do not meet the given condition are eliminated.
The actual output rows are computed by using the SELECT output expressions for each selected row or row group.
If you use the operators UNION, INTERSECT, and EXCEPT, the outputs of more than one SELECT statement can be combined to form a single result set. The UNION operator returns all rows that are in one or both of the result sets. The INTERSECT operator returns all rows that are in both result sets. The EXCEPT operator returns the rows that are in the first result set but not in the second. In all three cases, duplicate rows are eliminated unless ALL is specified. The DISTINCT keyword can be explicitly added to eliminate duplicate rows. When you combine the result sets of multiple SQL statements, DISTINCT is the default behavior, even though ALL is the default behavior for a single SELECT statement.
If the ORDER BY clause is specified, the returned rows are sorted in the specified order. If ORDER BY is not specified, the rows are returned in an order that allows the most efficient generation of rows.
If the LIMIT (or FETCH FIRST) or OFFSET clause is specified, the SELECT statement returns only a subset of the output rows.

Parameters

WITH list

Description
The WITH list is located preceding a SELECT statement or serves as a subquery of a SELECT statement. In most cases, the WITH list is located preceding a SELECT statement to define a subquery. It declares a name for the subquery and the names of returned columns. You can use the following syntax to define a WITH clause as a common table expression (CTE).
```
with query_name [ ( column_name [, ...] ) ] AS ( select )
```

Parameters

Parameter	Description
query_name	The name of the current CTE, which is a valid identifier.
column_name	The names of the columns returned by a subquery, which share similar semantics with the AS keyword in a SELECT subquery. A subquery can be a regular SELECT query.

CTEs are separated by commas (,). Subsequent CTEs can reference the previously defined CTEs. However, CTEs cannot be recursively called. In subsequent queries, query_name of a previously defined CTE can be used as a view. If no column_name list is specified for the CTE, the names of the columns returned by the related SELECT subquery are used as the column_name list of the view. If a column_name list is specified for the CTE, you must use the specified column_name list. In addition, you must make sure that the number of columns in the column_name list is the same as the number of columns returned by the related SELECT subquery.

SELECT list
- Description
  The SELECT list is located between the SELECT and FROM keywords to specify expressions that form the output rows of the SELECT statement. The expressions can and usually do reference the columns computed in the FROM clause.
- Parameters
  Each output column of a SELECT statement has a name. In a simple SELECT statement, this name is used only to label the column for display. However, when the SELECT statement is a subquery of a large query, the name is considered to be the column name of the virtual table produced by the subquery. To specify the name for an output column, write AS output_name that follows the expression of the column. You can omit AS if the specified output name does not match PostgreSQL keywords. To prevent conflicts with possible keywords to be added, we recommend that you use AS or double-quote the output name. If you do not specify a column name, the system automatically assigns a name. If the column expression is a simple column reference, the original column name is used. In more complex situations where a function or type name is used, names such as ?column? may be generated.
  The name of an output column can be used to indicate the value of the column in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clause. In the WHERE and HAVING clauses, you must write the expression.
  An asterisk (*) can be written in the output list as a short form for all the columns of the selected rows. You can also write %table_name*; as a short form for all columns from that table. In these cases, you cannot specify new names by using the AS keyword. The output column names are the same as the names of the table columns.

FROM clause

Description
The FROM clause specifies one or more source tables for the SELECT statement. If multiple source tables are specified, the result is the Cartesian product (cross join) of all the source tables. However, qualification conditions are usually added by using the WHERE clause to restrict the returned rows to a small subset of the Cartesian product.

Parameters

The following table describes parameters in the FROM clause.

Parameter	Description
table_name	The name of an existing table or view. The name can be qualified by a schema name.
alias	The substitute name for the FROM item that contains an alias. An alias is used for brevity or to eliminate ambiguity for self-joins where the same table is scanned multiple times. When an alias is provided, the actual name of the table or function is completely hidden. For example, given `FROM foo AS f`, the remainder of the SELECT statement must refer to this FROM item as f but not foo.
select	A SELECT subquery can appear in the FROM clause. A similar analogy is that the output of the SELECT subquery is created as a temporary table for the SELECT statement. The SELECT subquery must be enclosed in parentheses (), and an alias must be provided for it.
function_name	Function calls can appear in the FROM clause. This applies to all functions and is especially useful for functions that return result sets. A similar analogy is that the output of the function is created as a temporary table for the SELECT statement. An alias can be provided in the same way as for a table. If an alias is written, a column alias list can also be written to provide substitute names for one or more attributes of the composite return type of the function, including the column added by ORDINALITY if available. Multiple function calls can be combined into a single FROM-clause item by surrounding them with `ROWS FROM(...)`. The output of such an item is the concatenation of the first row from each function, then the second row from each function, and so on. If some of the functions produce fewer rows than others, null values are used for the missing data. This way, the total number of rows returned is always the same as that for the function that produces the most rows.
join_type	Valid values: [ INNER ] JOIN For the INNER and OUTER join types, one of the following join conditions must be specified: `NATURAL, ON join_condition, or USING (join_column [, ...])`. For more information about the join conditions, see the following text. For `CROSS JOIN`, none of these clauses can appear. A JOIN clause combines two FROM items, which for convenience are referred to as tables, though in reality they can be other types of FROM items. Use parentheses () if necessary to determine the order of nesting. In the absence of parentheses (), JOINs nest left to right. JOIN always binds more tightly than the commas (,) that separate FROM-list items. LEFT [ OUTER ] JOIN `LEFT OUTER JOIN` returns all rows in the qualified Cartesian product and each row in the left table that does not have a matching row in the right table. The qualified Cartesian product refers to all combined rows that pass the join condition. The unmatched rows in the left table are extended to the full width of the joined table by inserting null values to the columns in the right table. Only the condition of the JOIN clause is considered for deciding which rows have matches. Then, outer conditions are applied. RIGHT [ OUTER ] JOIN `RIGHT OUTER JOIN` returns all the joined rows, and each row in the right table that does not have a matching row in the left table (extended with null values in the left table). This is only for a notational convenience because you can convert it to a `LEFT OUTER JOIN` by switching the left and right tables. FULL [ OUTER ] JOIN `FULL OUTER JOIN` returns all the joined rows, each row in the left table that does not have a matching row in the right table (extended with null values in the right table), and each row in the right table that does not have a matching row in the left table (extended with null values in the left table). CROSS JOIN `CROSS JOIN` and `INNER JOIN` produce a simple Cartesian product, the same result as you obtain from listing the two tables at the top level of FROM, but restricted by the join condition if available. `CROSS JOIN` is equivalent to `INNER JOIN ON (TRUE)`. No rows are removed by qualification. These join types are only for a notational convenience because you can achieve the same results by using plain FROM and WHERE clauses.
join_condition	`join_condition` is an expression that evaluates to a result of the BOOLEAN type. It is similar to a WHERE clause and specifies which rows in a join are considered to have matches.
USING ( a, b, ... )	A clause in the format of `USING ( a, b, ...)` is the short form for `ON left_table.a = right_table.a AND left_table.b = right_table.b ...`. USING implies that only one of each pair of equivalent columns will be included in the join output, not both.
NATURAL	NATURAL is the short form for a USING list that mentions all columns in the two tables that have matching names.

WHERE clause

Description
The optional WHERE clause is in the following format:
```
WHERE condition
```

Parameters

Parameter	Description
condition	This parameter specifies an expression that evaluates to a result of the BOOLEAN type. All rows that do not meet this condition will be eliminated from the output. A row meets the condition if this expression returns true when the actual row values are used to replace variables.

GROUP BY clause
- Description
  GROUP BY condenses all selected rows that share the same values for the grouped expressions into a single row. The optional GROUP BY clause is in the following format:
```
GROUP BY grouping_element [, ...]
```
- Parameters
  An expression used in grouping_element can be an input column name, the name or ordinal number of an output column (SELECT list item), or an arbitrary expression formed from input column values. When ambiguity arises, the name specified in a GROUP BY clause will be interpreted as an input column name rather than an output column name.
  If GROUPING SETS, ROLLUP or CUBE is present as a grouping element, the GROUP BY clause as a whole defines a number of independent grouping sets. The effect is equivalent to constructing a UNION ALL between subqueries with the individual grouping sets as their GROUP BY clauses.
  Aggregate functions, if used, are computed across all rows making up each group, producing a separate value for each group. If aggregate functions are used but the GROUP BY clause is not used, the query is treated as having a single group comprising all the selected rows. The set of rows fed to each aggregate function can be further filtered by attaching a FILTER clause to the aggregate function call. When a FILTER clause is present, only those rows matching it are included in the input to that aggregate function.
  When GROUP BY or an aggregate function is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns. Otherwise, more than one possible value is returned for an ungrouped column. A functional dependency exists if the grouped columns are the primary key columns or a subset of the primary key of the table containing the ungrouped column.
  All aggregate functions are computed before scalar expressions in the HAVING clause or SELECT list. For example, a CASE expression cannot be used to skip computing of an aggregate function.
CUBE clause
- CUBE
  CUBE is used to automatically group and summarize the fields listed in the GROUP BY clause. The result set contains all possible combinations of the values in the dimension columns, and the aggregate values in the underlying rows that match those dimension value combinations. CUBE returns a row of summary information for each group. You can use CUBE to generate crosstab values. For example, three expressions (n = 3) are specified in the CUBE clause. The result is 8 groups (2ⁿ = 2³). The rows grouped by the values of the n expressions are called regular rows, and the remaining rows are called super-aggregated rows. Expression:
```
CUBE ( { expression | ( expression [, …] ) } [, …] )
```
- ROLLUP
  The ROLLUP function is an aggregate function. It is a simple extension to the GROUP BY clause. The ROLLUP function returns a subtotal for each group and a grand total for all groups during data statistical analysis and report generation. Expression:
```
ROLLUP ( { expression | ( expression [, …] ) } [, …] ) 
```
- GROUPING SETS
  The GROUPING SETS clause is an extension of the GROUP BY clause. It allows you to specify multiple GROUP BY options. This improves the efficiency by eliminating data groups that are not needed. If the desired data group is specified, the full CUBE or ROLLUP operation to generate an aggregated set is not required for databases. Expression:
```
GROUPING SETS ( grouping_element [, …] ) 
```
DISTINCT clause
If SELECT DISTINCT is specified, all duplicate rows are removed from the result set. One row is kept for each group of duplicates.
Note
SELECT DISTINCT does not support arrays.
```
SELECT DISTINCT accountid FROM table;
```
COUNT DISTINCT clause
- Description
  COUNT DISTINCT counts the number of unique values in a column. A value that appears multiple times in the column is counted only once. Null values are not counted. The usage of this clause is similar to that of the COUNT function.
- Syntax
  - The following syntax applies in accurate counting:
```
SELECT c1, COUNT(DISTINCT c2) FROM table GROUP BY c1
```
  - COUNT DISTINCT consumes many resources. Hologres also supports non-accurate COUNT DISTINCT, which uses the following syntax:
```
SELECT c1, approx_count_distinct(c2) FROM table GROUP BY c1
```

UNION clause

Description

The UNION clause uses the following syntax:

select_statement UNION [ ALL | DISTINCT ] select_statement

Parameters

Parameter

Description

select_statement

select_statement is a SELECT statement that does not contain an ORDER BY, LIMIT, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, or FOR KEY SHARE clause. ORDER BY and LIMIT can be attached to a subexpression if it is enclosed in parentheses (). Without parentheses (), these clauses will apply to the result of UNION, not to its right-side input expression.

UNION

The UNION operator computes the set union of the rows returned by the involved SELECT statements. A row is in the set union of two result sets if it appears in at least one of the result sets. The two SELECT statements that represent the direct operands of the UNION must produce the same number of columns, and corresponding columns must be of compatible data types.

The result of UNION does not contain duplicate rows unless the ALL option is specified. ALL prevents elimination of duplicates. Therefore, UNION ALL is usually significantly quicker than UNION. Use ALL whenever possible. The DISTINCT keyword can be explicitly specified to eliminate duplicate rows.

Multiple UNION operators in the same SELECT statement are evaluated left to right, unless otherwise indicated by parentheses ().

INTERSECT clause

Description

The INTERSECT clause uses the following syntax:

select_statement INTERSECT [ ALL | DISTINCT ] select_statement

Parameters

Parameter

Description

select_statement

select_statement is a SELECT statement that does not contain an ORDER BY LIMIT clause.

INTERSECT

The INTERSECT operator computes the set intersection of the rows returned by the involved SELECT statements. A row is in the intersection of two result sets if it appears in both result sets.

The result of INTERSECT does not contain duplicate rows unless the ALL option is specified. If the ALL option is specified, a row that has m duplicates in the left table and n duplicates in the right table will appear min(m,n) times in the result set. The DISTINCT keyword can be explicitly specified to eliminate duplicate rows.

Multiple INTERSECT operators in the same SELECT statement are evaluated left to right, unless otherwise indicated by parentheses (). INTERSECT binds more tightly than UNION. A UNION B INTERSECT C will be read as A UNION (B INTERSECT C).

EXCEPT clause

Description

The EXCEPT clause uses the following syntax:

select_statement EXCEPT [ ALL | DISTINCT ] select_statement

Parameters

Parameter

Description

select_statement

select_statement is a SELECT statement that does not contain an ORDER BY LIMIT clause.

EXCEPT

The EXCEPT operator computes the set of rows that are in the result of the left SELECT statement but not in the result of the right one.

The result of EXCEPT does not contain duplicate rows unless the ALL option is specified. If the ALL option is specified, a row that has m duplicates in the left table and n duplicates in the right table will appear max(m-n,0) times in the result set. The DISTINCT keyword can be explicitly specified to eliminate duplicate rows.

Multiple EXCEPT operators in the same SELECT statement are evaluated left to right, unless otherwise indicated by parentheses (). EXCEPT binds at the same level as UNION.

FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE and FOR KEY SHARE cannot be specified either for an EXCEPT result or for the input of an EXCEPT.

ORDER BY clause
- Description
  The optional ORDER BY clause uses the following syntax:
```
ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...]
```
- Parameters
  The ORDER BY clause causes the result rows to be sorted based on the specified expressions. If two rows are equal based on the leftmost expression, they are compared based on the next expression and so on. If they are equal based on all specified expressions, they are returned in an implementation-dependent order.
  Each expression can be the name or ordinal number of an output column (SELECT list item), or it can be an arbitrary expression formed from input column values.
  The ordinal number refers to the ordinal (left-to-right) position of the output column. This feature makes it possible to define an ordering on the basis of a column that does not have a unique name. This is never absolutely necessary because it is always possible to assign a name to an output column using the AS clause.
  It is also possible to use arbitrary expressions in the ORDER BY clause, including columns that do not appear in the SELECT output list. Therefore, the following statement is valid:
```
SELECT name FROM distributors ORDER BY code;
```
  A limit of this feature is that an ORDER BY clause applying to the result of a UNION, INTERSECT, or EXCEPT clause can specify only an output column name or number, not an expression.
  If an ORDER BY expression is a simple name that matches both an output column name and an input column name, ORDER BY will interpret it as the output column name. This is the opposite of the choice that GROUP BY will make in the same situation. This inconsistency is made to be compatible with the SQL standard.
  You can also add the keyword ASC (ascending) or DESC (descending) after an expression in the ORDER BY clause. If not specified, ASC is assumed by default. Alternatively, a specific ordering operator name can be specified in the USING clause. An ordering operator must be a less-than or greater-than member of some B-tree operator family. ASC is usually equivalent to USING < and DESC is usually equivalent to USING >. However, the creator of a user-defined data type can define exactly what the default sort ordering is, and it might correspond to operators with other names.
  If NULLS LAST is specified, null values sort after all non-null values. If NULLS FIRST is specified, null values sort before all non-null values. If neither is specified, the default behavior is NULLS LAST when ASC is specified or implied, and NULLS FIRST when DESC is specified or implied. Therefore, null values are larger than non-null values by default. When USING is specified, the default ordering of null values depends on whether the operator is a less-than or greater-than operator.
  Ordering options apply only to the expression they follow. For example, ORDER BY x, y DESC is not equivalent to ORDER BY x DESC, y DESC.
LIMIT clause
- Description
  The LIMIT clause consists of two independent sub-clauses.
```
LIMIT { count | ALL }
OFFSET start
```
- Parameters
  In the LIMIT clause, count specifies the maximum number of rows to return, whereas start specifies the number of rows to skip before rows are returned. When both are specified, start rows are skipped before starting to count the count rows to be returned.
  If the count expression evaluates to NULL, it is treated the same as LIMIT ALL, which means no limit. If start evaluates to NULL, it is treated the same as OFFSET 0.
  When LIMIT is used, we recommend that you use an ORDER BY clause that constrains the result rows into a unique order. Otherwise, you will obtain an unpredictable subset of the rows of the query result. For example, you may ask for the tenth through twentieth rows, but do not know the ordering. In this case, you must specify ORDER BY.
  The query planner takes LIMIT into account when generating a query plan, so you are very likely to get different plans (yielding different row orders) depending on what you use for LIMIT and OFFSET. Therefore, using different LIMIT or OFFSET values to select different subsets of a query result will give inconsistent results unless you enforce a predictable result ordering with ORDER BY. This is not a bug, but an inherent consequence of the fact that SQL does not promise to deliver the results of a query in a specified order unless ORDER BY is used to constrain the order.
  It is even possible for repeated executions of the same LIMIT query to return different subsets of the rows of a table, if ORDER BY is not used to enforce selection of a deterministic subset. This is not a bug. The accuracy of the results cannot be guaranteed in such a case.

Examples

JOIN clause for two tables

SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM 
distributors d, films f WHERE f.did = d.did;

WITH clause

WITH distributor_name(name) AS (SELECT name from distributors)
SELECT name FROM distributor_name ORDER BY name;

GROUP BY clause

SELECT kind, sum(length) AS total FROM films GROUP BY kind;

HAVING clause

SELECT kind, sum(length) AS total FROM films GROUP BY kind 
HAVING sum(length) < interval '5 hours';

GROUP BY CUBE

SELECT  l_returnflag
        ,l_shipmode
        ,SUM(l_quantity)
FROM    public.lineitem
GROUP BY cube((l_returnflag),(l_shipmode))
ORDER BY l_returnflag
         ,l_shipmode;

GROUP BY ROLLUP

SELECT  l_returnflag
        ,l_shipmode
        ,SUM(l_quantity)
FROM    public.lineitem
GROUP BY ROLLUP ((l_returnflag),(l_shipmode))
ORDER BY l_returnflag
         ,l_shipmode;

GROUP BY GROUPING SETS

SELECT  l_returnflag
        ,l_shipmode
        ,SUM(l_quantity)
FROM    public.lineitem
GROUP BY GROUPING SETS ((l_returnflag,l_shipmode),())
ORDER BY l_returnflag
         ,l_shipmode;

ORDER BY

SELECT * FROM distributors ORDER BY name;