Is your data in parts and lying in other object stores? No worries, Autonomous Database has you covered!

Whether you personally believe in the multi-cloud lifestyle that’s in vogue or not, the reality is you may be dealing with different departments or orgs within your company that store their data in their cloud of choice, and partitioned in various ways. Autonomous DB doesn't judge, we support querying or ingesting your data whether it lies in Oracle Cloud Object Store, Azure, or AWS! 

Let me walk you through a simple example of how you may query your multi-part data lying in any of these object stores. We will be using the new external partitioned table functionality in the DBMS_CLOUD package, which creates a partitioned table structure over flat files lying in your external storage. (ie. a query on this table fetches data directly from the flat files in external storage, instead of the database's own datafiles). You may use the other functions in this package if you need to apply this to a non-partitioned external table, or to actually copy data into your database.
Let's begin with some cloud concepts we will be using further in the post:
  • In general, an Object Store is a cloud-based, scalable, serverless storage platform that offers storage of “objects”. These objects can be unstructured data of any content type, including analytic data and rich content, like images and videos. Oracle’s Object Storage, AWS S3 and Azure Blob Storage are all examples of object stores.
  • A private object store URL is a URL that requires authentication via a credential to be accessed.
  • Pre-Authenticated URLs and Public URLs don’t require any credentials to be accessed. Having the complete URL will give someone access to that data.
Next, download the 2 part file in a zip here. In keeping with some of my previous posts, we will use the small weather history data set from Charlotte, NC in Comma Separated Values (CSV) format. The file which has "Part 1" in its name has July 2014 weather data, and that with "Part 2" will be the following month’s, Aug 2014 weather data.

Now select the cloud object store that you will be using to jump to the appropriate section:
      Image result for line        .     Image result for line       



Using Oracle Cloud Object Storage

 

Step 1: Unzip and upload the two downloaded files to your Oracle Cloud Object Storage bucket

Create an Object Store bucket and unzip and upload the two Weather history files to your object store. You can refer to this hands-on lab for a detailed walkthrough of how to do this in your Oracle Cloud tenancy.


Keep note of the "Visibility" (you can use Private or Public) of the bucket to which you are uploading the files. This will come in handy in Step 3.

Step 2: Create a credential to access your object store for your Private URLs

Dive into SQL Developer that is connected to your Autonomous Database, and create a credential for your object store like below. The username is your Oracle Cloud Infrastructure username. The password is your Oracle Cloud Infrastructure generated auth token:
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'OBJ_STORE_CRED',
    username => '<OCI username>',
    password => '<OCI auth token>'
  );
END;
/
Note: If you are unsure about how to create an auth token, refer to Step 8 in in the same lab tutorial mentioned above. While we have used a username and password for simplicity here, we recommend that you use native authentication for your production systems added security and versatility.

 

Step 3: Create an External Partitioned Table on top of your two-part Weather History data

Now that we have a credential ready, run the PL/SQL script below using your Object Store URLs in the location parameters. You may either construct the URLs yourself by looking at this URL format or find the URL from the details of your file in the object store console as in the following screens.



Since we have a month's worth of data in each file, we call the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE function to create the partitioned external table with 2 partitions using the REPORT_DATE column.
Note: We are using a credential parameter in this call to DBMS_CLOUD to illustrate the use of files that lie in a private object storage bucket (ie. that have private URLs). If we use public bucket or pre-authenticated URLs, we can omit the credential parameter entirely. We can also use a mix of private and public URLs from the same object store if required.

BEGIN
 DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE(   
 table_name =>'WEATHER_REPORT_EXTERNAL_PART',   
 credential_name =>'OBJ_STORE_CRED',   
 format => json_object('type' value 'csv', 'skipheaders' value '1', 
  'dateformat' value 'mm/dd/yy'),
  column_list => 'REPORT_DATE DATE,   ACTUAL_MEAN_TEMP NUMBER,   ACTUAL_MIN_TEMP NUMBER,   
                  ACTUAL_MAX_TEMP NUMBER,   AVERAGE_MIN_TEMP NUMBER,   
                  AVERAGE_MAX_TEMP NUMBER,   AVERAGE_PRECIPITATION NUMBER',
  partitioning_clause =>
    'partition by range (REPORT_DATE)
          (partition p1 values less than (to_date(''01-AUG-2014'',''DD-MON-YYYY'')) location
              ( ''https://objectstorage.us-ashburn-1.oraclecloud.com/
                  n/adwctraining5/b/partitiontest/o/Charlotte_NC_Weather_History_Part1.csv'')
           ,
           partition p2 values less than (to_date(''01-SEP-2014'',''DD-MON-YYYY'')) location
              ( ''https://objectstorage.us-ashburn-1.oraclecloud.com/
                  n/adwctraining5/b/partitiontest/o/Charlotte_NC_Weather_History_Part2.csv'')
           )'   );
END;
/

 

↓ Click here to Jump to Step 4     |    Click here to Jump to object store selection 


 

Using Microsoft Azure Blob Storage


Step 1: Unzip and upload the two downloaded files to your Azure Blob Storage container

Navigate to your Azure Blob Storage account, create a container and unzip and upload the two Weather history files. (Refer to these detailed steps on how to do this if necessary).


Keep note of the "Access Level" of your container (you can use a Private or Public container) in which you uploaded the files. This will come in handy in Step 3.

 

Step 2: Create a credential to access your Blob storage for your Private URLs

Dive into SQL Developer that is connected to your Autonomous Database, and create a credential for your blob store like below. The username is your Azure Storage account name. The password is your Azure storage account access key:
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'OBJ_STORE_CRED',
    username => '<Azure Storage account name>',
    password => '<Azure Storage account access key>'
  );
END;
/
Note: If you are still unsure about how to find your Azure storage account access key, follow detailed steps here.

 

Step 3: Create an External Partitioned Table on top of your two-part Weather History data

Now that we have a credential ready, run the PL/SQL script below using your Azure Blob store URLs in the location parameters. You may either construct the URLs yourself by looking at the URL format or find the URL from the details of your file in the blob store console as in the following screen:


Since we have a month's worth of data in each file, we call the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE function to create the partitioned external table with 2 partitions using the REPORT_DATE column.
Note: 
  • We are using a credential parameter in this call to DBMS_CLOUD to illustrate the use of files that lie in a private Azure blob storage container (ie. that have private URLs). If we use a public access container, we can omit the credential parameter entirely. We can also use a mix of private and public URLs from the same object store if required.
  • We don't currently support pre-authenticated URLs for Azure, that's coming soon. Feel free to comment below if your business relies heavily on this.

BEGIN
 DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE(   
 table_name =>'WEATHER_REPORT_EXTERNAL_PART',   
 credential_name =>'OBJ_STORE_CRED',   
 format => json_object('type' value 'csv', 'skipheaders' value '1', 
  'dateformat' value 'mm/dd/yy'),
  column_list => 'REPORT_DATE DATE,   ACTUAL_MEAN_TEMP NUMBER,   ACTUAL_MIN_TEMP NUMBER,   
                  ACTUAL_MAX_TEMP NUMBER,   AVERAGE_MIN_TEMP NUMBER,   
                  AVERAGE_MAX_TEMP NUMBER,   AVERAGE_PRECIPITATION NUMBER',
  partitioning_clause =>
    'partition by range (REPORT_DATE)
          (partition p1 values less than (to_date(''01-AUG-2014'',''DD-MON-YYYY'')) location
              ( ''https://nilaysobjectstore.blob.core.windows.net/
                  externaltest/Charlotte_NC_Weather_History_Part1.csv'')
           ,
           partition p2 values less than (to_date(''01-SEP-2014'',''DD-MON-YYYY'')) location
              ( ''https://nilaysobjectstore.blob.core.windows.net/
                  externaltest/Charlotte_NC_Weather_History_Part2.csv'')
           )'   );
END;
/

 

↓ Click here to Jump to Step 4     |    Click here to Jump to object store selection 


 

Using AWS S3 Storage

 

Step 1: Unzip and upload the two downloaded files to your AWS S3 Storage container

Navigate to your AWS S3 storage account, create a bucket and unzip and upload the two Weather history files. (Refer to these detailed steps on how to do this if necessary).


Keep note of the Permissions of your bucket (you can use a Private or Public access) in which you uploaded the files. This will come in handy in Step 3.

 

Step 2: Create a credential to access your AWS S3 storage for your Private URLs

Dive into SQL Developer that is connected to your Autonomous Database, and create a credential for your S3 store like below. The username is your AWS Access key ID. The password is your AWS secret access key:
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'OBJ_STORE_CRED',
    username => '<AWS Access key ID>',
    password => '<AWS secret access key>'
  );
END;
/
Note: If you are still unsure about how to find your AWS access key ID and secret access key, follow detailed steps here.

 

Step 3: Create an External Partitioned Table on top of your two-part Weather History data

Now that we have a credential ready, run the PL/SQL script below using your AWS S3 URLs in the location parameters. You may either construct the URLs yourself by looking at the URL format or find the URL from the details of your file in the S3 bucket console as in the following screen:


Since we have a month's worth of data in each file, we call the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE function to create the partitioned external table with 2 partitions using the REPORT_DATE column.
Note: 
  • We are using a credential parameter in this call to DBMS_CLOUD to illustrate the use of files that lie in a private AWS S3 storage bucket (ie. that have private URLs). If we use a public access bucket, we can omit the credential parameter entirely. We can also use a mix of private and public URLs from the same object store if required.
  • We don't currently support pre-authenticated URLs for AWS, but that's coming soon. Feel free to comment below if your business relies heavily on this.
BEGIN
 DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE(   
 table_name =>'WEATHER_REPORT_EXTERNAL_PART',   
 credential_name =>'OBJ_STORE_CRED',   
 format => json_object('type' value 'csv', 'skipheaders' value '1', 
  'dateformat' value 'mm/dd/yy'),
  column_list => 'REPORT_DATE DATE,   ACTUAL_MEAN_TEMP NUMBER,   ACTUAL_MIN_TEMP NUMBER,   
                  ACTUAL_MAX_TEMP NUMBER,   AVERAGE_MIN_TEMP NUMBER,   
                  AVERAGE_MAX_TEMP NUMBER,   AVERAGE_PRECIPITATION NUMBER',
  partitioning_clause =>
    'partition by range (REPORT_DATE)
          (partition p1 values less than (to_date(''01-AUG-2014'',''DD-MON-YYYY'')) location
             (''https://nilaystests.s3-us-west-1.amazonaws.com/Charlotte_NC_Weather_History_Part1.csv'')
           ,
           partition p2 values less than (to_date(''01-SEP-2014'',''DD-MON-YYYY'')) location
             (''https://nilaystests.s3-us-west-1.amazonaws.com/Charlotte_NC_Weather_History_Part2.csv'')
           )'   );
END;
/

Step 4: Validate and Query your data

We're done already! You may run the "DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE" function to validate that your partitioned external table was created correctly and can access its underlying data. (Validation is an optional but recommended step)
  EXEC DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE('WEATHER_REPORT_EXTERNAL_PART');
If things went awry and you receive an error, you may look at the logfile and badfile to easily troubleshoot any errors you see. (Refer to this lab tutorial's Step 12 for more detail on this).
If you sailed through with no errors, you may now query the table as a whole or by partition (ie. in this case, by month) with the script below.
Notice that in the third query we select data between July 5th and 21st 2014, all of which lies in partition P1. This query therefore makes use of the performance benefits of partitioned external tables, and only needs to physically access the file named "Part 1" file under the hood!

SELECT REPORT_DATE, AVERAGE_PRECIPITATION FROM WEATHER_REPORT_EXTERNAL_PART;
SELECT REPORT_DATE, AVERAGE_MIN_TEMP FROM WEATHER_REPORT_EXTERNAL_PART PARTITION (P2);
SELECT * FROM WEATHER_REPORT_EXTERNAL_PART where REPORT_DATE between '05-JUL-2014' and '21-JUL-2014';  

So, there you have it! Autonomous Database aims to not only simplify the provisioning and management of your database in the cloud, but also aims to provide easy to use functions to make your entire interaction with the database seamless and intuitive.

Comments

Popular posts from this blog

Easy Text-to-Speech with Python

Flutter for Single-Page Scrollable Websites with Navigator 2.0

Better File Storage in Oracle Cloud