Monday, February 22, 2021

DIY ANY_VALUE

 I really like the ANY_VALUE aggregate function. But since it was added in Oracle 19c, I can’t use it in my Oracle XE database, as currently the latest version of XE is 18c.

So I decided to implement it as a user-defined function using the Oracle Data Cartridges Interface.

Note that this implementation is for VARCHAR2. If the function is used on other data types, the regular rules for implicit conversion apply.

    
SQL> create type any_value_string_t as object
  2  (
  3      v_value varchar2(4000),
  4      static function odciaggregateinitialize(sctx in out any_value_string_t) return number,
  5      member function odciaggregateiterate
  6      (
  7          self  in out any_value_string_t,
  8          value in varchar2
  9      ) return number,
 10      member function odciaggregatemerge
 11      (
 12          self in out any_value_string_t,
 13          ctx2 in any_value_string_t
 14      ) return number,
 15      member function odciaggregateterminate
 16      (
 17          self        in any_value_string_t,
 18          returnvalue out varchar2,
 19          flags       in number
 20      ) return number
 21  );
 22  /

Type created.

SQL> create type body any_value_string_t as
  2
  3      static function odciaggregateinitialize(sctx in out any_value_string_t) return number is
  4      begin
  5          sctx := any_value_string_t(null);
  6          return odciconst.success;
  7      end;
  8
  9      member function odciaggregateiterate
 10      (
 11          self  in out any_value_string_t,
 12          value in varchar2
 13      ) return number is
 14      begin
 15          if self.v_value is null then
 16              self.v_value := value;
 17          end if;
 18          return odciconst.success;
 19      end;
 20
 21      member function odciaggregateterminate
 22      (
 23          self        in any_value_string_t,
 24          returnvalue out varchar2,
 25          flags       in number
 26      ) return number is
 27      begin
 28          returnvalue := self.v_value;
 29          return odciconst.success;
 30      end;
 31
 32      member function odciaggregatemerge
 33      (
 34          self in out any_value_string_t,
 35          ctx2 in any_value_string_t
 36      ) return number is
 37      begin
 38          if self.v_value is null then
 39              self.v_value := ctx2.v_value;
 40          end if;
 41          return odciconst.success;
 42      end;
 43
 44  end;
 45  /

Type body created.

SQL> begin
  2    $IF DBMS_DB_VERSION.ver_le_18 $THEN
  3      execute immediate q''create function any_value (p_value varchar2) return varchar2
  4                           parallel_enable
  5                           aggregate using any_value_string_t;'';
  6      execute immediate q''grant execute on any_value to public'';
  7      execute immediate q''create public synonym any_value for any_value'';
  8    $ELSE
  9      raise_application_error(-20000,'ANY_VALUE is now supported by Oracle');
 10    $END
 11  end;
 12  /

PL/SQL procedure successfully completed.

Now any_value can be used just like any built-in aggregate function:

SQL> conn hr/hr
Connected.
SQL> select banner from v$version;

BANNER
------------------------------------------------------------------------------------------------------------------------
Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production

1 row selected.

SQL> select d.department_id,
  2         any_value(d.department_name) department_name,
  3         count(*) number_of_employees
  4  from   employees   e,
  5         departments d
  6  where  d.department_id = e.department_id
  7  group  by d.department_id;

DEPARTMENT_ID DEPARTMENT_NAME      NUMBER_OF_EMPLOYEES
------------- -------------------- -------------------
           10 Administration                         1
           20 Marketing                              2
           30 Purchasing                             6
           40 Human Resources                        1
           50 Shipping                              45
           60 IT                                     5
           70 Public Relations                       1
           80 Sales                                 34
           90 Executive                              3
          100 Finance                                6
          110 Accounting                             2

11 rows selected.

A new aggregate function – ANY_VALUE – was added to Oracle. It is documented as of Oracle 21c, but apparently it exists also in 19c (at least in 19.8 – the version in which I tested it).

Many times when writing an aggregate query we add expressions to the GROUP BY clause just because we want to add them to the select list, although they don’t change the aggregation result.

For example, let’s count the number of cities per country_id:

select c2.country_id,
       count(*) number_of_cities
from   cities    c1,
       countries c2
where  c2.country_id = c1.country_id
group  by c2.country_id;

NOW, WHAT IF WE WANT TO RETURN ALSO THE COUNTRY NAME?

country_id is the primary key of the countries table, so there is no reason to change the aggregation key. But in an aggregate query the select list can include only aggregate functions, GROUP BY expressions, constants, or expressions involving one of these.

OPTION 1

So a common practice is to add country_name to the GROUP BY clause:

select c2.country_id,
       c2.country_name,
       count(*) number_of_cities
from   cities    c1,
       countries c2
where  c2.country_id = c1.country_id
group  by c2.country_id,
          c2.country_name;

Adding country_name to the GROUP BY clause is artificial, makes the query less clean, and the SQL engine may need to work harder as the aggregation key is wider.

OPTION 2

An alternative approach is to choose some simple aggregate function that won’t change the desired result, like MIN or MAX (but not SUM or COUNT…), and apply it on country_name in the select list:

select c2.country_id,
       min(c2.country_name) country_name,
       count(*) number_of_cities
from   cities    c1,
       countries c2
where  c2.country_id = c1.country_id
group  by c2.country_id;

Since all the records in the same group belong to the same country_id, then country_name is the same for all of them, so MIN(country_name)=MAX(country_name)=the right value.
In this option we don’t litter the GROUP BY clause, but we add an arbitrary aggregate function to the select list, which still feels artificial, and adds some (small) extra work for the SQL engine.

OPTION 3 – ANY_VALUE

Now we can use ANY_VALUE instead. ANY_VALUE(country_name) returns, for each group, the value of country_name from one of the records in that group.

select c2.country_id,
       any_value(c2.country_name) country_name,
       count(*) number_of_cities
from   cities    c1,
       countries c2
where  c2.country_id = c1.country_id
group  by c2.country_id;

Since ANY_VALUE is considered an aggregate function, we don’t need to add it to the GROUP BY clause. But, by definition, it’s a very simple function, so the performance overhead should be negligible with respect to other aggregate functions.

Using ANY_VALUE also makes the query more readable, in my opinion. It makes our intention clear (unlike the use of the MIN function in the previous example).

OPTION 4

There is another alternative to solve our example:

select c2.country_id,
       c2.country_name,
       c1.number_of_cities
from   (select country_id,
               count(*) number_of_cities
        from   cities
        group  by country_id) c1,
       countries c2
where  c2.country_id = c1.country_id;

Here we “aggregate and then join”, rather than “join and then aggregate”.

USE ANY_VALUE WISELY

I think that ANY_VALUE is a great addition to Oracle SQL. But use it wisely. Remember that by definition it is not deterministic. If you know that some_expression is unique for all the records in the same group in your query, then any_value(some_expression) is deterministic in your query and can be used safely. Otherwise, think very carefully if it’s appropriate for your case.




No comments:

Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...