Sunday, February 28, 2021

ANY_VALUE and FIRST/LAST (KEEP)

 

ANY_VALUE and FIRST/LAST (KEEP)

The FIRST and LAST (a.k.a. “KEEP”) aggregate functions are very useful when you want to order a row set by one column and return the value of another column from the first or last row in the ordered set.
I wrote about these functions in the past; for example here and here.

To make sure the result of the FIRST (LAST) function is deterministic, we have to define a tie-breaker for the case that multiple rows have the same first (last) value. The tie-breaker is an aggregate function that is applied on the column we want to return.

For example, the following query returns for each department the FIRST_NAME of the employee with the lowest SALARY. If there are multiple employees with the lowest salary in the same department, the query returns the “minimum” first_name – so here the MIN function is the tie-breaker.
In this example the lowest salary in department 90 is 17000, and both Neena and Lex have this salary. We use MIN(FIRST_NAME), so the result is “Lex”.

Many times we don’t really need a tie-breaker, because we know that there is a single first/last row. For example, if we use a unique expression in the ORDER BY clause of the function. And sometimes we simply don’t care which record is returned in case of a tie. But since the syntax requires a tie-breaker, we have to use some “random” function, like MIN or MAX.

The ANY_VALUE function (that was added in Oracle 19c) is perfect, in my opinion, for this case. It may improve performance, but – more importantly – it makes the code clearer, by better reflecting our intention.
Assuming that in the previous example we don’t care which one of the employees with the lowest salary is returned, we can rewrite it like this:

select
  department_id,
  any_value(first_name) keep(dense_rank FIRST order by salary)
from employees
group by department_id;

One thought on “ANY_VALUE and FIRST/LAST (KEEP)”

  1. Hi Oren,

    It is interesting that the ANY_VALUE function is a “legal aggregate” for the FIRST/LAST functions, though not specified explicitly in the documentation 🙂

    While it is true that the aggregate used here is usually supposed to be a “tie breaker”, just as MIN and MAX obviously are, however, the FIRST/LAST functions, as documented,
    also work with some “non-tie-breaker” aggregates like SUM, AVG or COUNT
    that may be considered as FIRST/LAST “accumulators”.

    But, they do not work with other undocumented aggregates like MEDIAN or LISTAGG,
    which look equally legitimate as all the documented ones:

    Both of the following queries return error:

    select
      department_id,
      MEDIAN(hire_date)  
                  keep(dense_rank FIRST order by salary)
    from hr.employees
    group by department_id
    /
    
    ORA-00923: FROM keyword not found where expected
    
    select
      department_id,
      LISTAGG(first_name,',') WITHIN GROUP (order by first_name)  
                  keep(dense_rank FIRST order by salary)
    from hr.employees
    group by department_id
    /
    
    ORA-00923: FROM keyword not found where expected
    

    The COLLECT function, instead, does not raise an error, but wrongly returns a NULL collection
    when there are “ties” in the FIRST/LAST rows, as we can see here, for department_id = 90.

    For groups without ties, it returns a correct collection containing the single FIRST/LAST value, similar to MIN/MAX
    ( we don’t see it here, because LiveSQL does not display collection types output properly,
    without a converting helper function ):

    select
      department_id,
      count(*)                              keep(dense_rank FIRST order by salary)  cnt,
      min(first_name)            keep(dense_rank FIRST order by salary)  low,
      max(first_name)            keep(dense_rank FIRST order by salary)  high,
      COLLECT(first_name) keep(dense_rank FIRST order by salary)   coll,
      CASE     
         WHEN COLLECT(first_name) keep(dense_rank FIRST order by salary) IS NULL THEN '***'
      END  null_coll
    from hr.employees
    group by department_id
    /
    
    DEPARTMENT_ID	CNT	LOW		HIGH		COLL			NULL_COLL
    -----------------------------------------------------------------------------------------
    10		1	Jennifer	Jennifer	[unsupported data type]	 - 
    20		1	Pat		Pat		[unsupported data type]	 - 
    30		1	Karen		Karen		[unsupported data type]	 - 
    40		1	Susan		Susan		[unsupported data type]	 - 
    50		1	TJ		TJ		[unsupported data type]	 - 
    60		1	Diana		Diana		[unsupported data type]	 - 
    70		1	Hermann		Hermann		[unsupported data type]	 - 
    80		1	Sundita		Sundita		[unsupported data type]	 - 
    90		2	Lex		Neena		[unsupported data type]	***
    100		1	Luis		Luis		[unsupported data type]	 - 
    110		1	William		William		[unsupported data type]	 - 
     - 		1	Kimberely	Kimberely	[unsupported data type]	 - 
    
    12 rows selected.
    

    So, it looks like some documentation adjustments are still required for the FIRST/LAST functions, as well as some enhancements to allow for additional “tie accumulator”
    aggregates to be used.

    Cheers & Best Regards,
    Iudith Mentzel

No comments:

Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...