JSON, Part 1: Store semi-structured documents in Oracle Database with SQL
Leverage the JSON data interchange standard and the SQL language to simplify information storage and retrieval.
[Due to recent interest in JSON—including the release of Oracle Autonomous JSON Database—we have updated this two-part series, first published in Oracle Magazine in 2015. —Ed.]
Recently Acme Bank has been accepting transactions from business partners such as convenience stores and third-party billing companies—and even partners outside the country. Those transactions contain different fields but conform to the JavaScript Object Notation (JSON) interface standard for semi-structured documents.
Mark, the chief technology officer, has been cornered by two colleagues who are concerned about the use of JSON.
JSON enables any set of data to be transmitted immediately, without a predetermined format expected by a relational database, and this makes it attractive for integrating outside transactions quickly. Acme’s partners can send any pertinent data they want without first waiting for a mutually agreeable format. However, even though Acme allows semi-structured data to come in as JSON, the data is stored in a structured manner in the database, in a relational format.
Here are the coworkers expressing concerns to Mark:
Dave, the lead developer, and his team spend a lot of time accepting, parsing, and deciphering the JSON documents before storing them in the database. The JSON format enables any data to be included, but if there is no corresponding database column (or table), the data can’t be accepted into Acme’s systems.
Debbie, the lead database administrator, has to alter the database structures quickly and often to accommodate the new data. It’s not OK with her that fast and frequent responses are required to make things work, and she wants Dave to stop sending her a continuous stream of data management projects that use semi-structured documents and ever-changing data fields.
But Dave has no choice. Acme must accept semi-structured data from its partners to be competitive in the marketplace, and the JSON format enables the company to develop a repository for valuable but loosely structured data. This repository must be flexible enough to store any type of data coming in while being easy to query with a language everyone understands: SQL.
So, Dave explains, JSON is here to stay—and its use will grow rapidly at Acme.
Altering existing database structures takes time, and if a specific data item in an incoming JSON document has no corresponding column in the Acme database, that transaction has to wait until the column is created. This means an interruption of the normal business of the company. Dave wants new database columns to somehow be created on the fly, although he is fully aware that Debbie insists it’s impossible.
In short, everyone is unhappy with the present arrangement. And they all turn to Mark, who brought in JSON as a transaction data exchange format in the first place.
There is a very easy solution, responds Mark: Store JSON documents directly in the database without any parsing, preprocessing, or prior alteration of database structures. The solution delivers the flexibility of JSON for semi-structured data and the power, reliability, and familiarity of Oracle Database—with no special actions required of the DBAs. Dave and Debbie, intrigued, press Mark to explain.
What is JSON?
JSON is a standard for free-format text in which any kind of data can be included, along with a descriptor. The descriptor for the data is called a key, and the actual data is called a value. The collection of related data is put into a single JSON document file. Any type of data can be represented as key-value pairs.
Mark shows everyone an example of a bank transaction in a JSON document, shown in Listing 1, and provides a description of key lines in the listing. (The line numbers shown are not part of the document; they are there to aid in the explanation shown in Table 1.)
Listing 1. A check-transaction JSON document
1 {
2 "TransId" : 1,
3 "TransDate" : "01-JAN-2015",
4 "TransTime" : "11:05:00",
5 "TransType" : "Deposit",
6 "AccountNumber" : 123,
7 "AccountName" : "Smith, John",
8 "TransAmount" : 100.00,
9 "Location" : "ATM",
10 "CashierId" : null,
11 "ATMDetails" : {
12 "ATMId" : 301,
13 "ATMLocation" : "123 Some St, Danbury CT 06810"
14 },
15 "WebDetails" : {
16 "URL" : null
17 },
18 "Source" : "Check",
19 "CheckDetails" : [
20 {
21 "CheckNumber" : 101,
22 "CheckAmount" : 50.00,
23 "AcmeBankFlag" : true,
24 "Endorsed" : true
25 },
26 {
27 "CheckNumber" : 102,
28 "CheckAmount" : 50.00,
29 "AcmeBankFlag" : false,
30 "Endorsed" : true
31 }
32 ]
33 }
Table 1. Explanation of code lines in Listing 1
Acme receives thousands of JSON documents like the one in Listing 1 every day as transactions, Dave confirms, and his team has created dozens of specialized programs to read the documents and store the data in appropriate tables and columns. Well, no more, says Mark.
Mark creates a TRANSACTIONS
table to hold the JSON document in Listing 1.
create table transactions (
id number not null primary key,
trans_msg clob,
constraint
check_json check (trans_msg is json)
)
Mark explains that he will store the JSON document in the TRANS_MSG
column.
But wait, exclaims Dave, this is just a simple character large object (CLOB) column, not a special JSON data type column.
That is precisely the point, confirms Mark. Oracle Database can store JSON data in ordinary CLOB columns without a special data type. However, to ensure that users load proper JSON data into the TRANS_MSG
column, and not just any bunch of characters, Mark adds a check constraint (TRANS_MSG IS JSON
) that requires the data to be in valid JSON format.
The other column in the table, ID
, is used only as an identifier for that record. The TRANSACTIONS
table doesn’t have columns for all the attributes of a transaction, such as the bank information and check number. All the data about a transaction is inside the JSON document. And that’s the beauty of it, Mark says—you get flexibility.
After creating the TRANSACTIONS
table, Mark stores the JSON document in the table by using the following SQL statement:
insert into transactions
values (
1
'{ complete text
. (without line numbers)
. of JSON document from
. Listing 1 here }'
)
/
Mark includes the text of the JSON document shown in Listing 1 between the single quotes in the above SQL statement. He then loads another JSON document, shown in Listing 2, into the TRANSACTIONS
table.
Listing 2. Inserting a JSON document for a transfer transaction
insert into transactions
values
(
sys_guid(),
systimestamp,
'{
"TransId" : 3,
"TransDate" : "01-JAN-2015",
"TransTime" : "10:05:00",
"TransType" : "Deposit",
"AccountNumber" : 125,
"AccountName" : "Smith, Jane",
"TransAmount" : 300.00,
"Location" : "website",
"CashierId" : null,
"ATMDetails" : null,
"WebDetails" : {
"URL" : "www.proligence.com/acme/dep.htm"
},
"Source" : "Transfer",
"TransferDetails" :
{
"FromBankRouting" : "012345678",
"FromAccountNo" : "1234567890",
"FromAccountType" : "Checking"
}
}'
)
/
Listing 2 shows a transfer transaction (shown as a value for Source
) that transfers money from an external account to account number 125, which is owned by Jane Smith. Because this is a transfer, check details are irrelevant, so there is no CheckDetails
section in this JSON document. Similarly, because it’s a transfer, it has some new keys, such as information on the source bank the money is coming from, shown under TransferDetails
.
Mark asks Debbie and Dave to compare the two JSON documents shown in Listing 1 and Listing 2. Although they both convey the same broad type of information (account transactions), they are for different types of transactions, so the document contents are different.
When Acme receives a new type of transaction, all the relevant details of that new type of transaction can be represented in a JSON document, even if no room has been made for the data in the database tables. That, Mark explains, is the advantage of storing JSON data in the database, where the entire contents of transactions can be put into a single column: TRANS_MSG
. And there is no need to alter the table!
Quick data loading
Dave points out that Acme receives data in JSON format in two ways—single transactions and groups of transactions as JSON document files. The groups typically include several hundred thousand transactions. Using an INSERT
statement works for single transactions, but it becomes impractical for inserting hundreds of thousands of JSON files. Is there a quick-loading mechanism?
There is, Mark assures them, with the SQL*Loader feature of Oracle Database. He creates a SQL*Loader control file named trans.ctl
, shown in Listing 3.
Listing 3. SQL*Loader control file and data file
File: trans.ctl
load data into table transactions
fields terminated by ','
(
trans_id sequence(max,1),
fname filler char(80),
trans_body lobfile(fname) terminated by EOF
)
File: trans.data
C:\acme\json_landing\trans1.txt
C:\acme\json_landing\trans2.txt
C:\acme\json_landing\trans3.txt
All the JSON files to be loaded are located in a landing area in the C:\acme\json_landing
directory. Mark creates another file named trans.data
containing the names of the JSON files to be loaded, also shown in Listing 3. Then he calls SQL*Loader:
sqlldr control=trans.ctl data=trans.data
This command loads all the JSON files listed in the trans.data
file to the TRANSACTIONS
table—without individual INSERT
statements.
Querying the data
Dave looks skeptical. All Mark did was load the JSON documents into a column of the CLOB data type, he points out, just as you would for any other text that is meaningless to the database. Will the applications have to extract the contents of this CLOB and then process the data via special programs, as they do now?
Not at all, Mark says with a smile. The database knows that the data in that CLOB column is in JSON format, and it is not just meaningless text. That means users can query that data directly, using a JSON extension of the SQL language. To demonstrate, Mark issues the SQL statement shown in Listing 4 to show one way of retrieving the data from the TRANS_MSG
column in the TRANSACTIONS
table.
Listing 4. Retrieving JSON (method 1)
Set null ?
select t.Trans_Msg.TransId trans_id,
t.Trans_Msg.Source source,
t.Trans_Msg.CheckDetails.CheckNumber check_num,
t.Trans_Msg.CheckDetails.CheckAmount check_amount,
t.Trans_Msg.CheckDetails.AcmeBankFlag Acme_Bank_Flag
from transactions t;
TRANS_ID SOURCE CHECK_NUM CHECK_AMOUNT ACME_BANK_FLAG
———————— ————————— —————————— ——————————————— ——————————————
1 Check [101,102] [50.00,50.00] [,]
2 Check 151 200.00 ?
3 Transfer ? ? ?
Pointing to the output, Mark shows how this extracted precisely the items he wanted and retrieved them formatted—as expected—and not as globs of text. The output even took care of multiple child items in an array.
For example, in TRANS_ID
1, there are two checks, 101 and 102, in amounts of $50.00 each. In TRANS_ID
2, there is only a single check, 151, for $200.00. TRANS_ID
3 is a transfer; so there is no information for the CheckDetails
key in the JSON document, and the values for CheckDetails.CheckNumber
and CheckDetails.CheckAmount
appear as ?
to indicate null values. (Null values appear as ?
because of SET NULL ?
at the beginning of the SQL statement.)
Mark cautions everyone that the JSON key names used in the query, including t.Trans_Msg.TransId
, are case-sensitive.
A second way to query the data, Mark continues, is to use the JSON functions built into Oracle Database, most notably JSON_VALUE
, which extracts the value of a specific key from a JSON document. Listing 5 shows the query he uses to extract the details for check 151.
Listing 5. Retrieving JSON (method 2)
select
json_value(trans_msg,'$.TransId') trans_id,
json_value(trans_msg,'$.Source') Source,
json_value(trans_msg,'$.CheckDetails.CheckNumber') check_num,
json_value(trans_msg,'$.CheckDetails.CheckAmount') check_amount,
json_value(trans_msg,'$.CheckDetails.AcmeBankFlag') Acme_Bank_Flag
from transactions
where json_value(trans_msg,'$.CheckDetails.CheckNumber') = 151
/
TRANS_ID SOURCE CHECK_NUM CHECK_AMOUNT ACME_BANK_FLAG
———————— ————————— —————————— ———————————— ——————————————
2 Check 151 200.00 true
The highest key in a JSON document, Mark explains, is represented by $
. Subsequent subkeys are placed in hierarchical order, separated by a period. For example, by writing $.CheckDetails.CheckNumber
, Mark instructs the SQL code to search starting from the top and going to the CheckDetails
key and further down to CheckNumber
.
Mark directs everyone’s attention to the output of $.CheckDetails.AcmeBankFlag
, which is true
instead of null, as was the case in Listing 4. The reason is simple: The boolean true
is a valid JSON data type, and the JSON_VALUE
function correctly parsed it. (Oracle recommends the use of JSON_VALUE
for querying JSON data stored in Oracle Database.)
Valerie, a senior developer listening to the conversation, raises a concern. TRANS_ID
1 includes two checks. How do you display, instead of as an array [101,102]
, only the first one?
That’s simple, Mark assures her, if you use the array counter. The counter starts at 0, so for the first subkey, use the following to extract the first check number:
json_value(trans_msg,
'$.CheckDetails[0].CheckNumber')
The second value will be represented as [1]
, and so on. To show all the values, Mark uses CheckDetails[*]
. (He can use specific numbers in the array to pull specific items—for example, CheckDetails[0,2]
to pull the first and third items.)
JSON performance
Debbie asks about the performance of queries using the JSON_VALUE
function. Because it’s essentially a full-text search, won’t it tax the database?
That’s a valid concern, Mark concedes, but Oracle Database knows there is JSON data in the column, and it can access the data with the JSON_VALUE
function, so indexes can be created on the most-used keys—just as with normal columns in database tables. Mark demonstrates this by creating a unique index on TransID
inside the JSON document.
create unique index in_trans_trans_id
on transactions
(json_value(trans_msg,'$.TransId'));
Another approach is to use bitmap indexes, which are particularly useful in high-volume environments where the frequency of the values is low. For instance, the AcmeBankFlag
key has just two possible values—true and false—to show whether a customer’s check was drawn from Acme or not. It’s perfect for a bitmap index. Mark creates the following index:
create bitmap index in_trans_acmeflag
on transactions
(json_value(trans_msg,
'$.CheckDetails.AcmeBankFlag'));
To demonstrate how the index is actually used, Mark shows the team Listing 6, the execution plan output of a query on that key.
Listing 6. Execution plan for a query on an indexed value
SQL> explain plan for
2 select * from transactions
3 where json_value(trans_msg,'$.CheckDetails.AcmeBankFlag') = 'true';
Explained.
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
——————————————————————————————————————————————————————————————————————————————
Plan hash value: 159327478
——————————————————————————————————————————————————————————————————————————————
|Id | Operation | Name | Rows | Bytes |
——————————————————————————————————————————————————————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 1235 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TRANSACTIONS | 1 | 1235 |
| 2 | BITMAP CONVERSION TO ROWIDS | | | |
|*3 | BITMAP INDEX SINGLE VALUE | IN_TRANS_ACMEFLAG | | |
——————————————————————————————————————————————————————————————————————————————
PLAN_TABLE_OUTPUT
——————————————————————————————————————————————————————————————————————————————
Predicate Information (identified by operation id):
——————————————————————————————————————————————————————————————————————————————
3 - access(JSON_VALUE("TRANS_MSG" FORMAT JSON ,
'$.CheckDetails.AcmeBankFlag' RETURNING
VARCHAR2(4000) NULL ON ERROR)='true')
To see which columns within the database table include JSON data, query the USER_JSON_COLUMNS
view.
SQL> select * from user_json_columns;
TABLE_NAME COLUMN_NAME FORMAT DATA_TYPE
------------- ------------- --------- ----------
TRANSACTIONS TRANS_MSG TEXT CLOB
Citing the output, Mark shows everyone that the query did indeed use the IN_TRANS_ACMEFLAG
index. Debbie is relieved to hear that, but she brings up another point. Sometimes users perform pattern-matching searches—for example, looking to see if an ATM in Stamford was used in any transaction. In such a case, they look for the word Stamford
in the ATMLocation
subkey within the ATMDetails
key; they cannot perform an exact match. Here is what the query looks like:
select
json_value(trans_msg,
'$.TransId') trans_id,
json_value(trans_msg,
'$.Source') Source
where json_textcontains (
trans_msg,
'$.ATMDetails.ATMLocation',
'Stamford'
);
Debbie is concerned that these types of queries, being a free-format search, will adversely affect the database performance. Not at all, Mark reassures them. He creates a special text index on the JSON data, using the following SQL code:
create index ind_trans_01
on transactions (trans_msg)
indextype is
ctxsys.contextparameters ('section group
CTXSYS.JSON_SECTION_GROUP
sync (on commit)');
The parameter section contains a special clause that instructs Oracle Database to index the columns as JSON sections, which facilitates free-format searches.
Conclusion
JSON is an increasingly popular data interchange format because of its flexible semi-structured format. Oracle Database offers reliability and the power of SQL, but its rows and columns usually demand structure. You can leverage both JSON and Oracle Database by storing JSON documents right in the database as CLOBs.
And while Mark has convinced Debbie, Dave, and Valerie that JSON is the tool they need to solve Acme Bank’s transaction processing challenges, in part 2 of this series, another colleague will enter the conversation: Cindy, the chief information officer.
Dig deeper
Image: Markus Spiske, Unsplash
Comments