Aggregation - PolarDB - Alibaba Cloud Documentation Center

Generally, the aggregate function aggr(expr) processes all matching rows for each aggregation key found in an incoming record (keys are compared using equivalence). In a regular aggregation in the form of aggr(expr), the list of aggregated values is the list of candidate values with all null values removed from it.

Prepare data

SELECT * FROM cypher('graph_name', $$
 CREATE (a:Person {name: 'A', age: 13}),
 (b:Person {name: 'B', age: 33, eyes: "blue"}),
 (c:Person {name: 'C', age: 44, eyes: "blue"}),
 (d1:Person {name: 'D', eyes: "brown"}),
 (d2:Person {name: 'D'}),
 (a)-[:KNOWS]->(b),
 (a)-[:KNOWS]->(c),
 (a)-[:KNOWS]->(d1),
 (b)-[:KNOWS]->(d2),
 (c)-[:KNOWS]->(d2)
$$) as (a agtype);

Auto group by

To calculate aggregated data, Cypher offers the aggregation capability, which is similar to GROUP BY of SQL.

Aggregate functions take a set of values and calculate the aggregation of those values. For example, avg calculates the average of multiple numeric values, and min finds the smallest numeric or string value in a set of values. When we say that an aggregate function operates on a set of values, we mean that the values are the result of the application of the inner expression (such as n.age) to all the records within the same aggregation group.

Aggregations can be computed on all matched subgraphs, or they can be further divided by introducing grouping keys. A grouping key is a non-aggregate expression that is used to group values that go into an aggregate function.

Take the following return statement as an example:

SELECT * FROM cypher('graph_name', $$
    MATCH (v:Person)
    RETURN v.name, count(*)
$$) as (grouping_key agtype, count agtype);

Sample result:

 grouping_key | count 
--------------+-------
 "A"          | 1
 "D"          | 2
 "Joan"       | 1
 "B"          | 1
 "John"       | 1
 "Bill"       | 1
 "C"          | 1
 "Jeff"       | 1
(8 rows)

The return result contains two columns: grouping_key and count(*). grouping_key is not an aggregate function. It is used as the grouping key. count(*) is an aggregate expression. The matching subgraphs will be divided into different buckets based on the grouping key. The aggregate function will then be run on these buckets, calculating an aggregate value per bucket.

Use aggregate functions to sort data

To use aggregations to sort the result set, the aggregation must be included in the RETURN clause to be used in ORDER BY.

SELECT *
FROM cypher('graph_name', $$
    MATCH (me:Person)-[]->(friend:Person)
    RETURN count(friend), me
    ORDER BY count(friend)
$$) as (friends agtype, me agtype);

DISTINCT aggregation

In a DISTINCT aggregation in the form of aggr(DISTINCT expr), the list of aggregated values is the list of candidate values with all null values removed from it. Furthermore, in a distinct aggregation, only one of all equivalent candidate values is included in the list of aggregated values, which means duplicates under equivalence are removed.

The DISTINCT operator works in conjunction with aggregation. It is used to make all values unique before running them through an aggregate function.

Example

SELECT *
FROM cypher('graph_name', $$
 MATCH (v:Person)
 RETURN count(DISTINCT v.eyes), count(v.eyes)
$$) as (distinct_eyes agtype, eyes agtype);

Sample result:

 distinct_eyes | eyes 
---------------+------
 2             | 3
(1 row)

Ambiguous grouping statements

This feature of not requiring the user to specify their grouping keys for a query allows for ambiguity on what Cypher should qualify as their grouping keys. For more information, see community documentation.

Prepare data

SELECT * FROM cypher('graph_name', $$
CREATE (:L {a: 1, b: 2, c: 3}),
       (:L {a: 2, b: 3, c: 1}),
       (:L {a: 3, b: 1, c: 2})
$$) as (a agtype);

Invalid queries in AGE

The solution provided by AGE to this problem is to not allow a WITH or RETURN column to combine aggregate functions with variables that are not explicitly listed in another column of the same WITH or RETURN clause.

Example

SELECT * FROM cypher('graph_name', $$
    MATCH (x:L)
    RETURN x.a + count(*) + x.b + count(*) + x.c
$$) as (a agtype);

Sample result:

ERROR:  "x" must be either part of an explicitly listed key or used inside an aggregate function
LINE 3:     RETURN x.a + count(*) + x.b + count(*) + x.c

Valid queries in AGE

Columns that do not include an aggregate function in AGE are considered to be the grouping keys for that WITH or RETURN clause.

Example

For the preceding query, you can use the following method to rewrite the query to return results:

SELECT * FROM cypher('graph_name', $$
    MATCH (x:L)
    RETURN (x.a + x.b + x.c) + count(*) + count(*), x.a + x.b + x.c
$$) as (count agtype, key agtype);

x.a + x.b + x.c is the grouping key. Grouping keys created like this must include parenthesis.

 count | key 
-------+-----
 12    | 6
(1 row)

SELECT * FROM cypher('graph_name', $$
    MATCH (x:L)
    RETURN x.a + count(*) + x.b + count(*) + x.c, x.a, x.b, x.c
$$) as (count agtype, a agtype, b agtype, c agtype);

x.a, x.b, and x.c are considered different grouping keys.

 count | a | b | c 
-------+---+---+---
 8     | 3 | 1 | 2
 8     | 2 | 3 | 1
 8     | 1 | 2 | 3
(3 rows)

Vertices and edges in ambiguous grouping

The grouping key can be a vertex or edge, and then any properties of the vertex or edge can be specified without being explicitly stated in a WITH or RETURN column.

Example

SELECT * FROM cypher('graph_name', $$
    MATCH (x:L)
    RETURN count(*) + count(*) + x.a + x.b + x.c, x
$$) as (count agtype, key agtype);

Results are grouped on x, because it is safe to assume that properties be considered unnecessary for grouping to be unambiguous.

 count |                                          key                                           
-------+----------------------------------------------------------------------------------------
 8     | {"id": 1407374883553283, "label": "L", "properties": {"a": 3, "b": 1, "c": 2}}::vertex
 8     | {"id": 1407374883553281, "label": "L", "properties": {"a": 1, "b": 2, "c": 3}}::vertex
 8     | {"id": 1407374883553282, "label": "L", "properties": {"a": 2, "b": 3, "c": 1}}::vertex
(3 rows)

Hide unwanted grouping keys

If the grouping key is considered unnecessary for the query output, the aggregation can be done in a WITH clause and then passing information to the RETURN clause.

Example

SELECT * FROM cypher('graph_name', $$
    MATCH (x:L)
    WITH count(*) + count(*) + x.a + x.b + x.c as column, x
    RETURN column
$$) as (a agtype);

Sample result:

 a 
---
 8
 8
 8
(3 rows)