I really like the ANY_VALUE aggregate function. But since it was added in Oracle 19c, I can’t use it in my Oracle XE database, as currently the latest version of XE is 18c.
So I decided to implement it as a user-defined function using the Oracle Data Cartridges Interface.
SQL> create type any_value_string_t as object 2 ( 3 v_value varchar2(4000), 4 static function odciaggregateinitialize(sctx in out any_value_string_t) return number, 5 member function odciaggregateiterate 6 ( 7 self in out any_value_string_t, 8 value in varchar2 9 ) return number, 10 member function odciaggregatemerge 11 ( 12 self in out any_value_string_t, 13 ctx2 in any_value_string_t 14 ) return number, 15 member function odciaggregateterminate 16 ( 17 self in any_value_string_t, 18 returnvalue out varchar2, 19 flags in number 20 ) return number 21 ); 22 / Type created. SQL> create type body any_value_string_t as 2 3 static function odciaggregateinitialize(sctx in out any_value_string_t) return number is 4 begin 5 sctx := any_value_string_t(null); 6 return odciconst.success; 7 end; 8 9 member function odciaggregateiterate 10 ( 11 self in out any_value_string_t, 12 value in varchar2 13 ) return number is 14 begin 15 if self.v_value is null then 16 self.v_value := value; 17 end if; 18 return odciconst.success; 19 end; 20 21 member function odciaggregateterminate 22 ( 23 self in any_value_string_t, 24 returnvalue out varchar2, 25 flags in number 26 ) return number is 27 begin 28 returnvalue := self.v_value; 29 return odciconst.success; 30 end; 31 32 member function odciaggregatemerge 33 ( 34 self in out any_value_string_t, 35 ctx2 in any_value_string_t 36 ) return number is 37 begin 38 if self.v_value is null then 39 self.v_value := ctx2.v_value; 40 end if; 41 return odciconst.success; 42 end; 43 44 end; 45 / Type body created. SQL> begin 2 $IF DBMS_DB_VERSION.ver_le_18 $THEN 3 execute immediate q''create function any_value (p_value varchar2) return varchar2 4 parallel_enable 5 aggregate using any_value_string_t;''; 6 execute immediate q''grant execute on any_value to public''; 7 execute immediate q''create public synonym any_value for any_value''; 8 $ELSE 9 raise_application_error(-20000,'ANY_VALUE is now supported by Oracle'); 10 $END 11 end; 12 / PL/SQL procedure successfully completed.
Now any_value can be used just like any built-in aggregate function:
SQL> conn hr/hr Connected. SQL> select banner from v$version; BANNER ------------------------------------------------------------------------------------------------------------------------ Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production 1 row selected. SQL> select d.department_id, 2 any_value(d.department_name) department_name, 3 count(*) number_of_employees 4 from employees e, 5 departments d 6 where d.department_id = e.department_id 7 group by d.department_id; DEPARTMENT_ID DEPARTMENT_NAME NUMBER_OF_EMPLOYEES ------------- -------------------- ------------------- 10 Administration 1 20 Marketing 2 30 Purchasing 6 40 Human Resources 1 50 Shipping 45 60 IT 5 70 Public Relations 1 80 Sales 34 90 Executive 3 100 Finance 6 110 Accounting 2 11 rows selected.
A new aggregate function – ANY_VALUE – was added to Oracle. It is documented as of Oracle 21c, but apparently it exists also in 19c (at least in 19.8 – the version in which I tested it).
Many times when writing an aggregate query we add expressions to the GROUP BY clause just because we want to add them to the select list, although they don’t change the aggregation result.
For example, let’s count the number of cities per country_id:
select c2.country_id, count(*) number_of_cities from cities c1, countries c2 where c2.country_id = c1.country_id group by c2.country_id;NOW, WHAT IF WE WANT TO RETURN ALSO THE COUNTRY NAME?
country_id is the primary key of the countries table, so there is no reason to change the aggregation key. But in an aggregate query the select list can include only aggregate functions, GROUP BY expressions, constants, or expressions involving one of these.
OPTION 1
So a common practice is to add country_name to the GROUP BY clause:
select c2.country_id, c2.country_name, count(*) number_of_cities from cities c1, countries c2 where c2.country_id = c1.country_id group by c2.country_id, c2.country_name;Adding country_name to the GROUP BY clause is artificial, makes the query less clean, and the SQL engine may need to work harder as the aggregation key is wider.
OPTION 2
An alternative approach is to choose some simple aggregate function that won’t change the desired result, like MIN or MAX (but not SUM or COUNT…), and apply it on country_name in the select list:
select c2.country_id, min(c2.country_name) country_name, count(*) number_of_cities from cities c1, countries c2 where c2.country_id = c1.country_id group by c2.country_id;
Since all the records in the same group belong to the same country_id, then country_name is the same for all of them, so MIN(country_name)=MAX(country_name)=the right value.
In this option we don’t litter the GROUP BY clause, but we add an arbitrary aggregate function to the select list, which still feels artificial, and adds some (small) extra work for the SQL engine.OPTION 3 – ANY_VALUE
Now we can use ANY_VALUE instead. ANY_VALUE(country_name) returns, for each group, the value of country_name from one of the records in that group.
select c2.country_id, any_value(c2.country_name) country_name, count(*) number_of_cities from cities c1, countries c2 where c2.country_id = c1.country_id group by c2.country_id;
Since ANY_VALUE is considered an aggregate function, we don’t need to add it to the GROUP BY clause. But, by definition, it’s a very simple function, so the performance overhead should be negligible with respect to other aggregate functions.
Using ANY_VALUE also makes the query more readable, in my opinion. It makes our intention clear (unlike the use of the MIN function in the previous example).
OPTION 4
There is another alternative to solve our example:
select c2.country_id, c2.country_name, c1.number_of_cities from (select country_id, count(*) number_of_cities from cities group by country_id) c1, countries c2 where c2.country_id = c1.country_id;Here we “aggregate and then join”, rather than “join and then aggregate”.
USE ANY_VALUE WISELY
I think that ANY_VALUE is a great addition to Oracle SQL. But use it wisely. Remember that by definition it is not deterministic. If you know that some_expression is unique for all the records in the same group in your query, then any_value(some_expression) is deterministic in your query and can be used safely. Otherwise, think very carefully if it’s appropriate for your case.
No comments:
Post a Comment