Generally, the aggregate function aggr(expr) processes all matching rows for each aggregation key found in an incoming record (keys are compared using equivalence). In a regular aggregation in the form of aggr(expr), the list of aggregated values is the list of candidate values with all null values removed from it.
Prepare data
SELECT * FROM cypher('graph_name', $$
CREATE (a:Person {name: 'A', age: 13}),
(b:Person {name: 'B', age: 33, eyes: "blue"}),
(c:Person {name: 'C', age: 44, eyes: "blue"}),
(d1:Person {name: 'D', eyes: "brown"}),
(d2:Person {name: 'D'}),
(a)-[:KNOWS]->(b),
(a)-[:KNOWS]->(c),
(a)-[:KNOWS]->(d1),
(b)-[:KNOWS]->(d2),
(c)-[:KNOWS]->(d2)
$$) as (a agtype);Auto group by
To calculate aggregated data, Cypher offers the aggregation capability, which is similar to GROUP BY of SQL.
Aggregate functions take a set of values and calculate the aggregation of those values. For example, avg calculates the average of multiple numeric values, and min finds the smallest numeric or string value in a set of values. When we say that an aggregate function operates on a set of values, we mean that the values are the result of the application of the inner expression (such as n.age) to all the records within the same aggregation group.
Aggregations can be computed on all matched subgraphs, or they can be further divided by introducing grouping keys. A grouping key is a non-aggregate expression that is used to group values that go into an aggregate function.
Take the following return statement as an example:
SELECT * FROM cypher('graph_name', $$
MATCH (v:Person)
RETURN v.name, count(*)
$$) as (grouping_key agtype, count agtype);Sample result:
grouping_key | count
--------------+-------
"A" | 1
"D" | 2
"Joan" | 1
"B" | 1
"John" | 1
"Bill" | 1
"C" | 1
"Jeff" | 1
(8 rows)The return result contains two columns: grouping_key and count(*). grouping_key is not an aggregate function. It is used as the grouping key. count(*) is an aggregate expression. The matching subgraphs will be divided into different buckets based on the grouping key. The aggregate function will then be run on these buckets, calculating an aggregate value per bucket.
Use aggregate functions to sort data
To use aggregations to sort the result set, the aggregation must be included in the RETURN clause to be used in ORDER BY.
SELECT *
FROM cypher('graph_name', $$
MATCH (me:Person)-[]->(friend:Person)
RETURN count(friend), me
ORDER BY count(friend)
$$) as (friends agtype, me agtype);DISTINCT aggregation
In a DISTINCT aggregation in the form of aggr(DISTINCT expr), the list of aggregated values is the list of candidate values with all null values removed from it. Furthermore, in a distinct aggregation, only one of all equivalent candidate values is included in the list of aggregated values, which means duplicates under equivalence are removed.
The DISTINCT operator works in conjunction with aggregation. It is used to make all values unique before running them through an aggregate function.
Example
SELECT *
FROM cypher('graph_name', $$
MATCH (v:Person)
RETURN count(DISTINCT v.eyes), count(v.eyes)
$$) as (distinct_eyes agtype, eyes agtype);Sample result:
distinct_eyes | eyes
---------------+------
2 | 3
(1 row)Ambiguous grouping statements
This feature of not requiring the user to specify their grouping keys for a query allows for ambiguity on what Cypher should qualify as their grouping keys. For more information, see community documentation.
Prepare data
SELECT * FROM cypher('graph_name', $$
CREATE (:L {a: 1, b: 2, c: 3}),
(:L {a: 2, b: 3, c: 1}),
(:L {a: 3, b: 1, c: 2})
$$) as (a agtype);Invalid queries in AGE
The solution provided by AGE to this problem is to not allow a WITH or RETURN column to combine aggregate functions with variables that are not explicitly listed in another column of the same WITH or RETURN clause.
Example
SELECT * FROM cypher('graph_name', $$
MATCH (x:L)
RETURN x.a + count(*) + x.b + count(*) + x.c
$$) as (a agtype);Sample result:
ERROR: "x" must be either part of an explicitly listed key or used inside an aggregate function
LINE 3: RETURN x.a + count(*) + x.b + count(*) + x.cValid queries in AGE
Columns that do not include an aggregate function in AGE are considered to be the grouping keys for that WITH or RETURN clause.
Example
For the preceding query, you can use the following method to rewrite the query to return results:
SELECT * FROM cypher('graph_name', $$ MATCH (x:L) RETURN (x.a + x.b + x.c) + count(*) + count(*), x.a + x.b + x.c $$) as (count agtype, key agtype);x.a + x.b + x.cis the grouping key. Grouping keys created like this must include parenthesis.count | key -------+----- 12 | 6 (1 row)SELECT * FROM cypher('graph_name', $$ MATCH (x:L) RETURN x.a + count(*) + x.b + count(*) + x.c, x.a, x.b, x.c $$) as (count agtype, a agtype, b agtype, c agtype);x.a,x.b, andx.care considered different grouping keys.count | a | b | c -------+---+---+--- 8 | 3 | 1 | 2 8 | 2 | 3 | 1 8 | 1 | 2 | 3 (3 rows)
Vertices and edges in ambiguous grouping
The grouping key can be a vertex or edge, and then any properties of the vertex or edge can be specified without being explicitly stated in a WITH or RETURN column.
Example
SELECT * FROM cypher('graph_name', $$
MATCH (x:L)
RETURN count(*) + count(*) + x.a + x.b + x.c, x
$$) as (count agtype, key agtype);Results are grouped on x, because it is safe to assume that properties be considered unnecessary for grouping to be unambiguous.
count | key
-------+----------------------------------------------------------------------------------------
8 | {"id": 1407374883553283, "label": "L", "properties": {"a": 3, "b": 1, "c": 2}}::vertex
8 | {"id": 1407374883553281, "label": "L", "properties": {"a": 1, "b": 2, "c": 3}}::vertex
8 | {"id": 1407374883553282, "label": "L", "properties": {"a": 2, "b": 3, "c": 1}}::vertex
(3 rows)Hide unwanted grouping keys
If the grouping key is considered unnecessary for the query output, the aggregation can be done in a WITH clause and then passing information to the RETURN clause.
Example
SELECT * FROM cypher('graph_name', $$
MATCH (x:L)
WITH count(*) + count(*) + x.a + x.b + x.c as column, x
RETURN column
$$) as (a agtype);Sample result:
a
---
8
8
8
(3 rows)