ANY_VALUE and FIRST/LAST (KEEP)
The FIRST and LAST (a.k.a. “KEEP”) aggregate functions are very useful when you want to order a row set by one column and return the value of another column from the first or last row in the ordered set.
I wrote about these functions in the past; for example here and here.
To make sure the result of the FIRST (LAST) function is deterministic, we have to define a tie-breaker for the case that multiple rows have the same first (last) value. The tie-breaker is an aggregate function that is applied on the column we want to return.
For example, the following query returns for each department the FIRST_NAME of the employee with the lowest SALARY. If there are multiple employees with the lowest salary in the same department, the query returns the “minimum” first_name – so here the MIN function is the tie-breaker.
In this example the lowest salary in department 90 is 17000, and both Neena and Lex have this salary. We use MIN(FIRST_NAME), so the result is “Lex”.
Many times we don’t really need a tie-breaker, because we know that there is a single first/last row. For example, if we use a unique expression in the ORDER BY clause of the function. And sometimes we simply don’t care which record is returned in case of a tie. But since the syntax requires a tie-breaker, we have to use some “random” function, like MIN or MAX.
The ANY_VALUE function (that was added in Oracle 19c) is perfect, in my opinion, for this case. It may improve performance, but – more importantly – it makes the code clearer, by better reflecting our intention.
Assuming that in the previous example we don’t care which one of the employees with the lowest salary is returned, we can rewrite it like this:
select
department_id,
any_value(first_name) keep(dense_rank FIRST order by salary)
from employees
group by department_id;
Hi Oren,
It is interesting that the ANY_VALUE function is a “legal aggregate” for the FIRST/LAST functions, though not specified explicitly in the documentation
While it is true that the aggregate used here is usually supposed to be a “tie breaker”, just as MIN and MAX obviously are, however, the FIRST/LAST functions, as documented,
also work with some “non-tie-breaker” aggregates like SUM, AVG or COUNT
that may be considered as FIRST/LAST “accumulators”.
But, they do not work with other undocumented aggregates like MEDIAN or LISTAGG,
which look equally legitimate as all the documented ones:
Both of the following queries return error:
The COLLECT function, instead, does not raise an error, but wrongly returns a NULL collection
when there are “ties” in the FIRST/LAST rows, as we can see here, for department_id = 90.
For groups without ties, it returns a correct collection containing the single FIRST/LAST value, similar to MIN/MAX
( we don’t see it here, because LiveSQL does not display collection types output properly,
without a converting helper function ):
So, it looks like some documentation adjustments are still required for the FIRST/LAST functions, as well as some enhancements to allow for additional “tie accumulator”
aggregates to be used.
Cheers & Best Regards,
Iudith Mentzel