This is a collection of some small features, patterns or snippets, that I seem to use frequently and like to share. The features are written in no specific order. However we can structure the features into performance related [P], convenience [C] or enablers [E].
Enablers [E] would be features that allow us to do something, that is otherwise impossible or at least very hard to achieve. Sometimes even to avoid an error.
Convenience [C] features are things that offer an alternative construct. Often they can be used instead of another option. Sometimes making the code easier to change (maintainability).
Performance [P] features improve execution speed. Often they come with a drawback or one should keep exceptions or rare race conditions in mind.
1. find multiple words using regexp_like
[C]
instead of writing multiple like conditions (or repeating the same select using like
multiple times), we can write a single regexp_like
expression.
Cumbersome:where (str like '%word1%' OR str like '%word2%' or ...)
Much easier:where regexp_like(str,'(word1|word2|...)')
The pipe “|” inside the regular expression acts as the OR operator and the parenthesis () are needed because of that OR.
A very typical case is to query the v$parameter table for a set of parameter names.
example
select name , value from v$parameter where regexp_like( name , '(listen|dispatch|service)' ); |
NAME | VALUE |
service_names | feno1pod |
enable_dnfs_dispatcher | FALSE |
dispatchers | (PROTOCOL=TCP) (SERVICE=feno1podXDB) |
max_dispatchers |
What I especially like about it, is that the need for wildcards is gone and it is so easy to add more words to search for.
2. fetch first row only [C]
Do you still use rownum
frequently? Why not use to the row limiting clause instead?
I use it a lot, mostly for ad hoc queries. One advantage is that the need to create an inner query that does the correct ordering disappears.
example
-- show the top 3 tables with the most rows select table_name, num_rows from user_tables order by num_rows desc fetch first 3 rows only ; |
TABLE_NAME | NUM_ROWS |
IMP_SD_2744-12_45_18 | 19696 |
IMP_SD_2822-14_28_53 | 19692 |
IMP_SD_194-09_40_50 | 19545 |
3. use expression lists (a,b) [E]
I use this frequently in update statements but also sometimes in joins. Instead of setting each column separately we can set multiple columns at once. Simply by using parenthesis and a comma between the columns.
example
1 2 3 4 5 | update persons u set (u.first_name, u.last_name) = ( select a.firstname, a. name from applicants a where a.person_id = u.id) where u.status = 'NEEDS REFRESH' ; |
Or we can filter on a combination of columns (a,b) in ((c,d),(d,c),(e,f))
. The need for this should be rare, but it happens. A strange edge case is when we do an equality comparison of two expression lists. For some reason the right side needs an additional set of ().
(a,b) = ((c,d))
4. DeMorgans law [C,E]
This might be the single most important math/logic rule, that SQL developers should know. It is usually applied inside the where clause and knowing it by heart will prevent a lot of easy mistakes.
written as a SQL expression
not(A or B) = not(A) and not(B)
not(A and B) = not(A) or not(B)
It is easy to remember. OR changes to AND when the parenthesis are resolved (and the other way round).
A and B here are full SQL expressions, for example x=y.
Why is that so important? SQL is a very logic based language. In daily speach we often use logical operators (AND/OR/NOT) differently than what is needed when they should be applied in SQL. This can become confusing very easily. Knowing DeMorgans law helps to quickly check if the logic is used correctly.
An example
Task: “Find all employees that are not working in sales and marketing.”
Converting this sentence 1:1 into SQL would result in this:
select * from emp where not (department = 'SALES' and department = 'MARKETING' ); |
Obviously what is ment, is that we want to find those employees that are working in some other department.
Applying Damorgans Law, we can reformulate our select statement. Also we replace NOT (x=y)
with x!=y
.
select * from emp where department != 'SALES' OR department != 'MARKETING' ; |
A developer should understand now, that this condition will always be true. For each row the department is either sales or not sales. And if it is sales, then it is not marketing. So the combination is always true (excluding NULL value considerations). Which is probably not what is wanted.
Here the row logic also plays a part. The where clause is applied to a single row, but normal speach often uses the boolean operation to combine data sets.
The correct query would be
select * from emp where not (department = 'SALES' OR department = 'MARKETING' ); |
or even better use IN instead or OR
select * from emp where department not in ( 'SALES' , 'MARKETING' ); |
Historic anecdote: Da Morgan was not the first one to discover this logic. Centuries before Da Morgan, a guy called William of Ockam already wrote about it. He probably had it from Aristotle. Occam (the spelling changed over the centuries) nowadays is more famous for his razor.
5. rollup and grouping sets [C]
To get a total row for a query that uses sum or count, simply add rollup
.
example rollup
select tablespace_name, count (*) from dba_tables group by rollup (tablespace_name); |
TABLESPACE_NAME COUNT(*)
DATA 362
DBFS_DATA 2
SAMPLESCHEMA 14
SYSAUX 1357
SYSTEM 1056
322
3113
One problem here are tables without a tablespace (null). The grouping()
function helps to separate a normal row with a data value NULL from a superaggregate row (19c grouping).
If there are multiple columns then instead of rollup
I use grouping sets
. The full set (=total) in a grouping set expression can be expressed by ().
example grouping sets
+ grouping
select case when grouping (tablespace_name) = 1 then '-all-' else tablespace_name end as tablespace_name , case when grouping (cluster_name) = 1 then '-all-' else cluster_name end as cluster_name , count (*) from dba_tables group by grouping sets ((tablespace_name, cluster_name),()); |
TABLESPACE_NAME CLUSTER_NAME COUNT(*)
322
DATA 362
SYSAUX 1356
SYSAUX SMON_SCN_TO_TIME_AUX 1
SYSTEM 1020
SYSTEM C_RG# 2
SYSTEM C_TS# 2
SYSTEM C_OBJ# 17
SYSTEM C_COBJ# 2
SYSTEM C_MLOG# 2
SYSTEM C_USER# 2
SYSTEM C_FILE#_BLOCK# 2
SYSTEM C_OBJ#_INTCOL# 1
SYSTEM C_TOID_VERSION# 6
DBFS_DATA 2
SAMPLESCHEMA 14
-all- -all- 3113
6. enhanced listagg [E]
In the more recent database versions, the very useful LISTAGG command got even better. For production code I nowadays always try to remember to add some safety protection in place in case result of the aggregation becomes big. Otherwise it could happen to get ORA-01489: result of string concatenation is too long.
Since 12.2 we can avoid the error by using the OVERFLOW
clause
listagg (... on overflow truncate without count ) ... |
So instead of the error message, when the maximum string size is reached (4000 bytes or 32k bytes depending on max_string_size
parameter) we get usable text without the statement raising an error.
example
select count (*) , listagg(table_name, ', ' on overflow truncate ) within group ( order by tablespace_name desc , table_name) all_tables from dba_tables; |
COUNT(*) | ALL_TABLES |
3113 | AQ$_ALERT_QT_G, AQ$_ALERT_QT_H, AQ$_ALERT_QT, … many many more tables …, SDO_CRS_GEOGRAPHIC_PLUS_HEIGHT, SDO_CS_SRS, SDO_DATUMS, …(1304) |
The three dots “…” are called an ellipsis and can be configured. without count
would avoid writing the total number of entries to the end of the list. with count
is the default if truncate
is specified.
Although the overflow clause is very usfull, the ultimate goal would be to give the developer more influence over it. Recently there was an interesting twitter discussion around that topic.
Other useful enhancements (19c) were LISTAGG distinct. example on LiveSQL
7. Using sys_context
[C,P]
A sys_context is something like a global variable in other languages. The normal context is for the session, but it is also possible to use application wide contexts.
Oracle provides several “preconfigured” contexts for us. The most common are ‘USERENV’ and ‘APEX$SESSION’ (for apex developers). Contexts are also used for security policies with VPD.
Here are the contexts that I frequently like to use
sys_context('userenv','client_identifier')
value set by dbms_application_info.set_client_infosys_context('userenv','current_edition_name')
when using edition based redefinition (ebr), this shows which edition I’m in. Always good to double check!sys_context('userenv','current_user')
similar to pseudocolumnuser
. The schema name that we connected with.
In some oracle versions (12.1) much faster than user, my recent test shows that this performance difference is now gone (19c)sys_context('userenv','proxy_user')
When doing a proxy connect then this is the name of the authenticated user, not the target schemasys_context('userenv','os_user')
useful when client_identifier is empty, for example the name of the Windows login account when using SQL Developer under Windows.sys_context('apex$session','app_user')
apex authenticated usersys_context('apex$session','app_id'
)
id of the current apex appsys_context('trigger_ctl','audit_on')
Is auditing enabled/disabled? Part of my custom adaptation/enhancement of Connor McDonalds Audit-Utility packagesys_context('userenv','sid')
the session ID of the current database session
Side note: UPPER/lower case does not make a difference for contexts. Only for the values of the context.
Warning: when you start using your own custom contexts, be careful with the grant create any context
privilege. It can be a security risk! Revoke it after it was used.
8. hint /*+ driving_site */ [P]
I feel like this is a classic for me. In one of my longest running projects we had a lot of databases connected by database links. One of the most important tuning activities was to understand how to do remote joins in a performant way. The driving_site
hint was extremely important for that.
General rule: When doing remote queries, avoid mixing local and remote tables in one select statement. If only remote tables are in the statement always use the driving_site
hint.
Nowadays the need for this hint diminishes, mostly because databases are less connected by DB links, but more by webservices (which does not really improve performance, but that is a different story)
9. join via using
[C]
This only works reliably when the naming conventions of the data model fit to it.
example
We have a table PERSON and a table ADDRESS. The primary key in PERSON is PERS_ID. The relationship column (FK) in ADDRESS is also called PERS_ID. It wouldn’t work as well with ADR_PERS_ID for example.
Then we can do a quick join like this:
1 2 3 4 | select * from person join address using (pers_id) where address.type= 'MAIN' ; |
There are some slight differences compared to the normal way using ON. Mostly one then can not differentiate anymore from which table the pers_id originates. For example you can not refer to person.pers_id in the where clause anymore.
Currently I use it exclusively for ad hoc queries, not in plsql code.
10. interval literals [C]
If I want to add some time to a date or a timestamp, I always use interval literals (unless it is a full day or a full month).
example
Let’s say we want to check tickets that were entered during the last hour. Using the date datatype we could easily calculate an hour by dividing a full day / 24. This is how many developers calculate time. And it is perfectly ok to do so.
1 2 | select * from tickets where createddt >= sysdate - 1/24 ; |
The term sysdate-1/24
resembles one hour before “now”.
This is how I prefer to write it:
select * from tickets where createddt >= sysdate - interval '1' hour ; |
I feel the code documents itself in a much better way.
Additionally requirements like this easily change.
Let’s change both expressions to 15 minutes.
sysdate - 15/24/60
sysdate - interval '15' minute
Which one is better to read and was quicker to change?
honorable mentions
Some more snippets or functions that didn’t quite make it into the list.
nvl2
[C]- the third parameter of
to_char
(nlsparams) [E] - query block hints [P]
- hierarchical with clause [C,E]
- lateral join [C]
alter sequence restart
[C,E]- cross join [C]
lnnvl
[C,E]- hex format “XXXX” [E]
sys.odcinumberlist
[C]- analytic sum + correct window clause:
rows between
instead ofrange between
[E]
Summary
I hope I mentioned something that you didn’t know about before and liked reading about.
Outlook
Here are some features/patterns that I’m currently not using myself, but where I feel that they might become quite important in the future.
- SQL Macros
- boolean in SQL (not implemented yet, but Oracle seems to be working on it – I expect some news about this very soon)
- median and other statistical functions like cume_dist, percentile_xxx, percent_rank
match_recognize
- with clause udf functions (still think they are overrated, but I might be wrong)
approx_count_distinct
- analytical views
Hi Swen,
just my 2 cents:
> 4. DeMorgans law [C,E]
DeMorgans law is for binary logic, but SQL uses ternary logic, so you always need to analyze if there can be NULLs.
>select * from emp
>where not (department = ‘SALES’ and department = ‘MARKETING’);
department can’t be ‘SALES’ and ‘MARKETING at the same time, so this query will return no rows. You need OR here.
>9. join via using [C]
Natural joins have a lot of cons, so I wouldn’t suggest it at all.
>10. interval literals [C]
Oh, intervals is a pretty big topic with a number of nuances
for example:
SQL>
select
sysdate-interval
'1'
month
from
dual;
select
sysdate-interval
'1'
month
from
dual
*
ERROR
at
line 1:
ORA-01839:
date
not
valid
for
month
specified
or
SQL>
select
dump(
timestamp
'2021-01-01 00:00:00'
-1) dmp1
from
dual;
DMP1
---------------------------------
Typ=13 Len=8: 228,7,12,31,0,0,0,0
SQL>
select
dump(
timestamp
'2021-01-01 00:00:00'
-interval
'1'
day
) dmp2
from
dual;
DMP2
-----------------------------------------------------------
Typ=187 Len=20: 228,7,12,31,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0