Thursday, April 29, 2021

JSON, Part 1: Store semi-structured documents in Oracle Database with SQL

 

Leverage the JSON data interchange standard and the SQL language to simplify information storage and retrieval.

[Due to recent interest in JSON—including the release of Oracle Autonomous JSON Database—we have updated this two-part series, first published in Oracle Magazine in 2015. —Ed.]

Recently Acme Bank has been accepting transactions from business partners such as convenience stores and third-party billing companies—and even partners outside the country. Those transactions contain different fields but conform to the JavaScript Object Notation (JSON) interface standard for semi-structured documents.

Mark, the chief technology officer, has been cornered by two colleagues who are concerned about the use of JSON.

JSON enables any set of data to be transmitted immediately, without a predetermined format expected by a relational database, and this makes it attractive for integrating outside transactions quickly. Acme’s partners can send any pertinent data they want without first waiting for a mutually agreeable format. However, even though Acme allows semi-structured data to come in as JSON, the data is stored in a structured manner in the database, in a relational format.

Here are the coworkers expressing concerns to Mark:

Dave, the lead developer, and his team spend a lot of time accepting, parsing, and deciphering the JSON documents before storing them in the database. The JSON format enables any data to be included, but if there is no corresponding database column (or table), the data can’t be accepted into Acme’s systems.

Debbie, the lead database administrator, has to alter the database structures quickly and often to accommodate the new data. It’s not OK with her that fast and frequent responses are required to make things work, and she wants Dave to stop sending her a continuous stream of data management projects that use semi-structured documents and ever-changing data fields.

But Dave has no choice. Acme must accept semi-structured data from its partners to be competitive in the marketplace, and the JSON format enables the company to develop a repository for valuable but loosely structured data. This repository must be flexible enough to store any type of data coming in while being easy to query with a language everyone understands: SQL.

So, Dave explains, JSON is here to stay—and its use will grow rapidly at Acme.

Altering existing database structures takes time, and if a specific data item in an incoming JSON document has no corresponding column in the Acme database, that transaction has to wait until the column is created. This means an interruption of the normal business of the company. Dave wants new database columns to somehow be created on the fly, although he is fully aware that Debbie insists it’s impossible.

In short, everyone is unhappy with the present arrangement. And they all turn to Mark, who brought in JSON as a transaction data exchange format in the first place.

There is a very easy solution, responds Mark: Store JSON documents directly in the database without any parsing, preprocessing, or prior alteration of database structures. The solution delivers the flexibility of JSON for semi-structured data and the power, reliability, and familiarity of Oracle Database—with no special actions required of the DBAs. Dave and Debbie, intrigued, press Mark to explain.

What is JSON?

JSON is a standard for free-format text in which any kind of data can be included, along with a descriptor. The descriptor for the data is called a key, and the actual data is called a value. The collection of related data is put into a single JSON document file. Any type of data can be represented as key-value pairs.

Mark shows everyone an example of a bank transaction in a JSON document, shown in Listing 1, and provides a description of key lines in the listing. (The line numbers shown are not part of the document; they are there to aid in the explanation shown in Table 1.)

Listing 1. A check-transaction JSON document

Copied to Clipboard
Error: Could not Copy
  1  {
  2      "TransId"       :    1,
  3      "TransDate"     :    "01-JAN-2015",
  4      "TransTime"     :    "11:05:00",
  5      "TransType"     :    "Deposit",
  6      "AccountNumber" :    123,
  7      "AccountName"   :    "Smith, John",
  8      "TransAmount"   :    100.00,
  9      "Location"      :    "ATM",
 10      "CashierId"     :    null,
 11      "ATMDetails"    : {
 12          "ATMId"       : 301,
 13          "ATMLocation" : "123 Some St, Danbury CT 06810"
 14              },
 15      "WebDetails"    : {
 16          "URL"    : null
 17              },
 18      "Source"    :    "Check",
 19      "CheckDetails"  : [
 20                  {
 21                      "CheckNumber"    : 101,
 22                      "CheckAmount"    : 50.00,
 23                      "AcmeBankFlag"    : true,
 24                      "Endorsed"    : true
 25                  },
 26                  {
 27                      "CheckNumber"    : 102,
 28                      "CheckAmount"    : 50.00,
 29                      "AcmeBankFlag"    : false,
 30                      "Endorsed"    : true
 31                  }
 32              ]
 33  }

Table 1. Explanation of code lines in Listing 1

LineExplanation
1 and 33JSON keys and values are enclosed in curly braces { and }.
2The TransId key is a unique identifier for the transaction. Key names are enclosed in double quotation marks, and the value for a key is shown after the colon. Because this TransId value is a number data type, quotation marks are not needed.
3The value for TransDate is a date format, so it is enclosed in quotation marks. (Characters, time stamps, and nonnumeric data are enclosed in quotation marks.)
10The CashierId value is null, which means it is not known. Because this is an ATM transaction, a cashier is irrelevant. The JSON document could have omitted the CashierId key and value, but it included the key for completeness and assigned a null value.
11–14ATMDetails embeds a key (which includes the ATMId and ATMLocation keys and values).
19–32CheckDetails introduces embedded keys and values, but instead of introducing values for just one activity, such as an ATM transaction (as in the case of ATMDetails), CheckDetails introduces multiple values for multiple activities—one set of subkeys and values for each check deposited. There are two checks in this array, and arrays are presented in square brackets [ and ].
23In JSON true is a boolean data type (used without quotation marks).

Acme receives thousands of JSON documents like the one in Listing 1 every day as transactions, Dave confirms, and his team has created dozens of specialized programs to read the documents and store the data in appropriate tables and columns. Well, no more, says Mark.

Mark creates a TRANSACTIONS table to hold the JSON document in Listing 1.

Copied to Clipboard
Error: Could not Copy
create table transactions (
   id      number not null primary key,
   trans_msg       clob,
   constraint 
   check_json check (trans_msg is json)
)

Mark explains that he will store the JSON document in the TRANS_MSG column.

But wait, exclaims Dave, this is just a simple character large object (CLOB) column, not a special JSON data type column.

That is precisely the point, confirms Mark. Oracle Database can store JSON data in ordinary CLOB columns without a special data type. However, to ensure that users load proper JSON data into the TRANS_MSG column, and not just any bunch of characters, Mark adds a check constraint (TRANS_MSG IS JSON) that requires the data to be in valid JSON format.

The other column in the table, ID, is used only as an identifier for that record. The TRANSACTIONS table doesn’t have columns for all the attributes of a transaction, such as the bank information and check number. All the data about a transaction is inside the JSON document. And that’s the beauty of it, Mark says—you get flexibility.

After creating the TRANSACTIONS table, Mark stores the JSON document in the table by using the following SQL statement:

Copied to Clipboard
Error: Could not Copy
insert into transactions
values (
1
'{ complete text
. (without line numbers)
. of JSON document from
. Listing 1 here }'
)
/

Mark includes the text of the JSON document shown in Listing 1 between the single quotes in the above SQL statement. He then loads another JSON document, shown in Listing 2, into the TRANSACTIONS table.

Listing 2. Inserting a JSON document for a transfer transaction

Copied to Clipboard
Error: Could not Copy
insert into transactions
values
(
   sys_guid(),
   systimestamp,
'{
   "TransId"         :   3,
   "TransDate"       :   "01-JAN-2015",
   "TransTime"       :   "10:05:00",
   "TransType"       :   "Deposit",
   "AccountNumber"   :   125,
   "AccountName"     :   "Smith, Jane",
   "TransAmount"     :   300.00,
   "Location"        :   "website",
   "CashierId"       :   null,
   "ATMDetails"      :   null,
   "WebDetails"      : {
                     "URL"   : "www.proligence.com/acme/dep.htm"
              },
   "Source"   :   "Transfer",
   "TransferDetails"   :
            {
               "FromBankRouting"   : "012345678",
               "FromAccountNo"   : "1234567890",
               "FromAccountType"   : "Checking"
            }
}'
)
/

Listing 2 shows a transfer transaction (shown as a value for Source) that transfers money from an external account to account number 125, which is owned by Jane Smith. Because this is a transfer, check details are irrelevant, so there is no CheckDetails section in this JSON document. Similarly, because it’s a transfer, it has some new keys, such as information on the source bank the money is coming from, shown under TransferDetails.

Mark asks Debbie and Dave to compare the two JSON documents shown in Listing 1 and Listing 2. Although they both convey the same broad type of information (account transactions), they are for different types of transactions, so the document contents are different.

When Acme receives a new type of transaction, all the relevant details of that new type of transaction can be represented in a JSON document, even if no room has been made for the data in the database tables. That, Mark explains, is the advantage of storing JSON data in the database, where the entire contents of transactions can be put into a single column: TRANS_MSG. And there is no need to alter the table!

Quick data loading

Dave points out that Acme receives data in JSON format in two ways—single transactions and groups of transactions as JSON document files. The groups typically include several hundred thousand transactions. Using an INSERT statement works for single transactions, but it becomes impractical for inserting hundreds of thousands of JSON files. Is there a quick-loading mechanism?

There is, Mark assures them, with the SQL*Loader feature of Oracle Database. He creates a SQL*Loader control file named trans.ctl, shown in Listing 3.

Listing 3. SQL*Loader control file and data file

Copied to Clipboard
Error: Could not Copy
File: trans.ctl
load data into table transactions
fields terminated by ','
(
  trans_id    sequence(max,1),
  fname       filler char(80),
  trans_body  lobfile(fname) terminated by EOF
)
File: trans.data
C:\acme\json_landing\trans1.txt
C:\acme\json_landing\trans2.txt
C:\acme\json_landing\trans3.txt

All the JSON files to be loaded are located in a landing area in the C:\acme\json_landing directory. Mark creates another file named trans.data containing the names of the JSON files to be loaded, also shown in Listing 3. Then he calls SQL*Loader:

Copied to Clipboard
Error: Could not Copy
sqlldr control=trans.ctl data=trans.data

This command loads all the JSON files listed in the trans.data file to the TRANSACTIONS table—without individual INSERT statements.

Querying the data

Dave looks skeptical. All Mark did was load the JSON documents into a column of the CLOB data type, he points out, just as you would for any other text that is meaningless to the database. Will the applications have to extract the contents of this CLOB and then process the data via special programs, as they do now?

Not at all, Mark says with a smile. The database knows that the data in that CLOB column is in JSON format, and it is not just meaningless text. That means users can query that data directly, using a JSON extension of the SQL language. To demonstrate, Mark issues the SQL statement shown in Listing 4 to show one way of retrieving the data from the TRANS_MSG column in the TRANSACTIONS table.

Listing 4. Retrieving JSON (method 1)

Copied to Clipboard
Error: Could not Copy
Set null ?
select t.Trans_Msg.TransId trans_id,
       t.Trans_Msg.Source source,
       t.Trans_Msg.CheckDetails.CheckNumber check_num,
       t.Trans_Msg.CheckDetails.CheckAmount check_amount,
       t.Trans_Msg.CheckDetails.AcmeBankFlag Acme_Bank_Flag
from transactions t;
TRANS_ID SOURCE    CHECK_NUM  CHECK_AMOUNT    ACME_BANK_FLAG
———————— ————————— —————————— ——————————————— ——————————————
1        Check     [101,102]  [50.00,50.00]   [,]
2        Check     151        200.00          ?
3        Transfer  ?          ?               ?

Pointing to the output, Mark shows how this extracted precisely the items he wanted and retrieved them formatted—as expected—and not as globs of text. The output even took care of multiple child items in an array.

For example, in TRANS_ID 1, there are two checks, 101 and 102, in amounts of $50.00 each. In TRANS_ID 2, there is only a single check, 151, for $200.00. TRANS_ID 3 is a transfer; so there is no information for the CheckDetails key in the JSON document, and the values for CheckDetails.CheckNumber and CheckDetails.CheckAmount appear as ? to indicate null values. (Null values appear as ? because of SET NULL ? at the beginning of the SQL statement.)

Mark cautions everyone that the JSON key names used in the query, including t.Trans_Msg.TransId, are case-sensitive.

A second way to query the data, Mark continues, is to use the JSON functions built into Oracle Database, most notably JSON_VALUE, which extracts the value of a specific key from a JSON document. Listing 5 shows the query he uses to extract the details for check 151.

Listing 5. Retrieving JSON (method 2)

Copied to Clipboard
Error: Could not Copy
select
   json_value(trans_msg,'$.TransId') trans_id,
   json_value(trans_msg,'$.Source') Source,
   json_value(trans_msg,'$.CheckDetails.CheckNumber') check_num,
   json_value(trans_msg,'$.CheckDetails.CheckAmount') check_amount,
   json_value(trans_msg,'$.CheckDetails.AcmeBankFlag') Acme_Bank_Flag
from transactions
where json_value(trans_msg,'$.CheckDetails.CheckNumber') = 151
/
TRANS_ID SOURCE    CHECK_NUM  CHECK_AMOUNT ACME_BANK_FLAG
———————— ————————— —————————— ———————————— ——————————————
2        Check     151        200.00       true

The highest key in a JSON document, Mark explains, is represented by $. Subsequent subkeys are placed in hierarchical order, separated by a period. For example, by writing $.CheckDetails.CheckNumber, Mark instructs the SQL code to search starting from the top and going to the CheckDetails key and further down to CheckNumber.

Mark directs everyone’s attention to the output of $.CheckDetails.AcmeBankFlag, which is true instead of null, as was the case in Listing 4. The reason is simple: The boolean true is a valid JSON data type, and the JSON_VALUE function correctly parsed it. (Oracle recommends the use of JSON_VALUE for querying JSON data stored in Oracle Database.)

Valerie, a senior developer listening to the conversation, raises a concern. TRANS_ID 1 includes two checks. How do you display, instead of as an array [101,102], only the first one?

That’s simple, Mark assures her, if you use the array counter. The counter starts at 0, so for the first subkey, use the following to extract the first check number:

Copied to Clipboard
Error: Could not Copy
json_value(trans_msg,
'$.CheckDetails[0].CheckNumber')

The second value will be represented as [1], and so on. To show all the values, Mark uses CheckDetails[*]. (He can use specific numbers in the array to pull specific items—for example, CheckDetails[0,2] to pull the first and third items.)

JSON performance

Debbie asks about the performance of queries using the JSON_VALUE function. Because it’s essentially a full-text search, won’t it tax the database?

That’s a valid concern, Mark concedes, but Oracle Database knows there is JSON data in the column, and it can access the data with the JSON_VALUE function, so indexes can be created on the most-used keys—just as with normal columns in database tables. Mark demonstrates this by creating a unique index on TransID inside the JSON document.

Copied to Clipboard
Error: Could not Copy
create unique index in_trans_trans_id
on transactions
(json_value(trans_msg,'$.TransId'));

Another approach is to use bitmap indexes, which are particularly useful in high-volume environments where the frequency of the values is low. For instance, the AcmeBankFlag key has just two possible values—true and false—to show whether a customer’s check was drawn from Acme or not. It’s perfect for a bitmap index. Mark creates the following index:

Copied to Clipboard
Error: Could not Copy
create bitmap index in_trans_acmeflag
on transactions
(json_value(trans_msg,
'$.CheckDetails.AcmeBankFlag'));

To demonstrate how the index is actually used, Mark shows the team Listing 6, the execution plan output of a query on that key.

Listing 6. Execution plan for a query on an indexed value

Copied to Clipboard
Error: Could not Copy
SQL> explain plan for
  2  select * from transactions
  3  where json_value(trans_msg,'$.CheckDetails.AcmeBankFlag') = 'true';
Explained.
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
——————————————————————————————————————————————————————————————————————————————
Plan hash value: 159327478
——————————————————————————————————————————————————————————————————————————————
|Id | Operation                           | Name              | Rows | Bytes |
——————————————————————————————————————————————————————————————————————————————
| 0 | SELECT STATEMENT                    |                   |    1 |  1235 |
| 1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TRANSACTIONS      |    1 |  1235 |
| 2 |   BITMAP CONVERSION TO ROWIDS       |                   |      |       |
|*3 |    BITMAP INDEX SINGLE VALUE        | IN_TRANS_ACMEFLAG |      |       |
——————————————————————————————————————————————————————————————————————————————
PLAN_TABLE_OUTPUT
——————————————————————————————————————————————————————————————————————————————
Predicate Information (identified by operation id):
——————————————————————————————————————————————————————————————————————————————
   3 - access(JSON_VALUE("TRANS_MSG" FORMAT JSON ,
              '$.CheckDetails.AcmeBankFlag' RETURNING
              VARCHAR2(4000) NULL ON ERROR)='true')

To see which columns within the database table include JSON data, query the USER_JSON_COLUMNS view.

Copied to Clipboard
Error: Could not Copy
SQL> select * from user_json_columns;
TABLE_NAME    COLUMN_NAME   FORMAT    DATA_TYPE
------------- ------------- --------- ----------
TRANSACTIONS  TRANS_MSG     TEXT      CLOB

Citing the output, Mark shows everyone that the query did indeed use the IN_TRANS_ACMEFLAG index. Debbie is relieved to hear that, but she brings up another point. Sometimes users perform pattern-matching searches—for example, looking to see if an ATM in Stamford was used in any transaction. In such a case, they look for the word Stamford in the ATMLocation subkey within the ATMDetails key; they cannot perform an exact match. Here is what the query looks like:

Copied to Clipboard
Error: Could not Copy
select
  json_value(trans_msg,
             '$.TransId') trans_id,
  json_value(trans_msg,
             '$.Source') Source
where json_textcontains (
  trans_msg,
  '$.ATMDetails.ATMLocation',
  'Stamford'
);

Debbie is concerned that these types of queries, being a free-format search, will adversely affect the database performance. Not at all, Mark reassures them. He creates a special text index on the JSON data, using the following SQL code:

Copied to Clipboard
Error: Could not Copy
create index ind_trans_01 
on transactions (trans_msg)
indextype is 
ctxsys.contextparameters ('section group 
CTXSYS.JSON_SECTION_GROUP 
sync (on commit)');

The parameter section contains a special clause that instructs Oracle Database to index the columns as JSON sections, which facilitates free-format searches.

Conclusion

JSON is an increasingly popular data interchange format because of its flexible semi-structured format. Oracle Database offers reliability and the power of SQL, but its rows and columns usually demand structure. You can leverage both JSON and Oracle Database by storing JSON documents right in the database as CLOBs.

And while Mark has convinced Debbie, Dave, and Valerie that JSON is the tool they need to solve Acme Bank’s transaction processing challenges, in part 2 of this series, another colleague will enter the conversation: Cindy, the chief information officer.

Dig deeper

Image: Markus Spiske, Unsplash

Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...