Saturday, August 8, 2020

NoSQL on the Cloud With Python

 Familiarity with the cloud is one of those skills that, as a data scientist, machine learning engineer, or developer — is a game-changer.

No matter which domain we work in, data is always the center of attention, closely followed by our end-users. A good cloud-based database service will cater to both of these needs.

We offload our data to a location where our end-users will be able to access it via an interface that anyone can use. This interface could be a web-app — or API for the more technical end-users.

Firestore is Google’s cloud-hosted NoSQL database service offering. There are several key benefits of the service, namely:

  • Flexible data storage (NoSQL)
  • Native libraries support most popular languages (Python)
  • You learn for free (surpassing the free tier limits takes a lot of data)
  • Everything is simple — we can spin up a database quicker than the time it takes to make coffee
  • Authentication (can be) handled by Google, we maximize security with minimal work
  • Automated scaling to meet demand
  • Intuitive documentation, possibly the best of the big three cloud-providers (Azure, AWS, GCP)

In this article, we will set up our Firebase project (the platform hosting Firestore), create a database using the web-UI, and create a simple Python script performing all of the essential Firestore functions. It will look like this:

In Firebase
> Create a Project
> Initialize Database
> Get Credentials
In Python
> Install gcloud Libraries (+ venv)
> Authenticate
> Connect to Firestore
> Get
> Create
> Modify
> Delete
> Queries

Let’s get started.


Firebase

Create a Project

Fortunately, Google is good at making things easy. Go to the Firebase website and click the big Get started button.

Image for post

We should now see the ‘Welcome to Firebase’ page above — create a project and give it a cool name (it should probably be relevant to your project too).

I’ve named this one ‘antigravity’.

You should see a window asking whether we would like to enable Google Analytics — this is entirely your choice, we won’t be using it, but it will also cost you nothing.

Image for post

Our project will initialize, and on clicking Continue, we are taken to the Project Overview page of our new project.

Initialize a Database

Image for post

To initialize our Firestore database, we navigate to the sidebar and click Develop > Database. Here we will see the ‘Cloud Firestore’ page above — click Create database.

Here we will be asked whether to start in production or test mode. We will be using test mode.

Next, you select your location — this is where you or your users will requesting data from, so choose the nearest option!

Image for post
Click Start collection to add the ‘places’ collection and ‘rome’ document — this is also genuine travel advice.

Finally, we have access to our Database. For this tutorial we will need a collection called ‘places’ and a document called ‘rome’ like above — simply click Start collection to add these (where_to_go is an array).


Over to Python

Installing Dependencies

We need to pip install the firebase_admin package:

pip install firebase-admin

With any version of Python beyond 3.6, we will receive a SyntaxError when importing firebase_admin to our scripts:

Image for post
We will raise a SyntaxError if any version of Python from 3.7 onwards is used.

The async identifier was added in Python 3.7 — breaking the firebase_admin module. We have two options here:

  1. Modify firebase_admin and replace every instance of async with _async (or any keyword of your choice).
  2. Use Python 3.6 via a virtual environment (venv).

Option (1) is probably a bad idea. So let’s quickly cover a venv setup using Anaconda (ask Google or me for how on other environments).

Open Anaconda prompt and create a new venv (we will call it fire36):

conda create --name fire36 python

Activate the venv and install the firebase_admin package:

conda activate fire36
pip install firebase_admin

If you’re using Jupyter or Spyder, simply type jupyter|jupyter lab|spyder to write code using this venv. Both PyCharm and VSCode can use this venv too.

Authenticate

To access Firestore, we need lots of credentials. Google again makes this easy.

Navigate to your project in the Firebase Console.

Image for post

Next to ‘Project Overview’ in the top-left, click the gear icon and select Project Settings.

Now we click the Service accounts tab where we will find instructions on authentication with the Firebase Admin SDK.

Image for post
Click Generate new private key to download our credentials as a JSON file.

Now, we need to click Generate new private key. We download the file, rename it to serviceAccountKey.json (or anything else) and store it in an accessible location for our Python scripts.

Storing serviceAccountKey.json in the same directory for testing is okay — but keep the contents of the JSON private. Public GitHub repos are a terrible place for credentials.

On the same Firebase Admin SDK page, we can copy and paste the Python code into the top of our script. All we need to do is update the path to serviceAccountKey.json and add firestore to our imports — our script should look like this:

import firebase_admin
from firebase_admin import credentials, firestore
cred = credentials.Certificate("path/to/serviceAccountKey.json")
firebase_admin.initialize_app(cred)

That is all we need for authentication. Now we (finally) move onto writing some code!

Connect to Firestore

There are three layers to our connection:

Database > Collection > Document

db = firestore.client()  # this connects to our Firestore database
collection = db.collection('places') # opens 'places' collection
doc = collection.document('rome') # specifies the 'rome' document
Image for post
We access each layer in turn — from database (antigravity-207f8) > collection (places) > document (rome).

Each layer comes with its own set of methods that allow us to perform different operations at the database, collection, or document level.

Get

We use the get method to retrieve data. Let’s use this method to get our rome document:

doc = collection.document('rome')
res = doc.get().to_dict()
print(res)
[Out]: {
'lat': 41.9028, 'long': 12.4964,
'where_to_go': [
'villa_borghese',
'trastevere',
'vatican_city'
]
}

We can also perform a .get() operation on collection to return an array of all documents contained within it. If we had two documents, it would look like this:

docs = collection.get()
print(docs)
[Out]: [
<google.cloud.firestore_v1.document.DocumentSnapshot object ...>,
<google.cloud.firestore_v1.document.DocumentSnapshot object ...>
]

These documents are stored as DocumentSnapshot objects — the same object types we receive when using the .document(<doc-id>).get() method above. As with the first get example — we can use .to_dict() to convert these objects to dictionaries.

Create

We create documents using both the .document(<doc-id>) and the .set() method on collection. The .set() method takes a dictionary containing all of the data we would like to store within our new <doc-id>, like so:

res = collection.document('barcelona').set({
'lat': 41.3851, 'long': 2.1734,
'weather': 'great',
'landmarks': [
'guadí park',
'gaudí church',
'gaudí everything'
]
})
print(res)
[Out]: update_time {
seconds: 1596532394
nanos: 630200000
}

If the operation is successful, we will receive the update_time in our response.

Image for post
The new data will appear on our Firestore GUI.

By navigating back to the Firestore interface, we can see our new document data as above.

Modify

Sometimes, rather than creating a whole new document, we will need to modify an existing one. There are several ways of doing this, depending on what it is we want to change.

To update a full key-value pair, we use update:

res = collection.document('barcelona').update({
'weather': 'sun'
})
Image for post
The update method allows us to modify only the key-value pairs within our statement.

The update method works for most values, but when we simply want to add or remove a single entry in an array, it is less useful. Here we use the firestore.ArrayUnion and firestore.ArrayRemove methods for adding and removing individual array values respectively, like so:

collection.document('rome').update({
'where_to_go': firestore.ArrayUnion(['colosseum'])
})
Image for post
We have updated the where_to_go array to include colosseum.

And to remove vatican_city and trastevere:

collection.document('rome').update({
'where_to_go': firestore.ArrayRemove(
['vatican_city', 'trastevere']
)})
Image for post
After removal of both vatican_city and colosseum from the where_to_go array.

Delete

Other times, we may need to delete documents in their entirety. We do this with the delete method:

collection.document('rome').delete()
Image for post
Our Firestore database no longer contains the rome document.

If we wanted to delete a single field within a document, we could use firestore.DELETE_FIELD like so:

collection.document('barcelona').update({
'weather': firestore.DELETE_FIELD})
Image for post
The Barcelona document no longer contains the weather field.

Query

Taking things to the next level, we can specify what exactly it is we want.

Image for post
Our Firestore now contains entries for Barcelona, Brisbane, NYC, and Rome.

For this example, we have added several global cities (code for adding them is here) to Firestore.

We will query for all cities within Europe — which we are defining as having a longitude of more than -9.4989° (west) and less than 33.4299° (east). We will ignore latitude for the sake of simplicity.

To query our Firestore, we use the where method on our collection object. The method has three arguments, where(fieldPath, opStr, value):

  • fieldPath — the field we are targeting, in this case 'long'
  • opStr — comparison operation string, '==' checks equality
  • value — the value we are comparing to

Anything true for our query will be returned. For our example, we will find documents where long > 9.4989 — which we write as:

collection.where('long', '>', 9.4989).get()
Image for post
All documents returned from our long > 9.4989 query.

With this query, we return barcelonarome, and brisbane — but we need to exclude anything East of Europe too. We can do this by adding another where method like so:

collection.where('long', '>', -9.4989) \
.where('long', '<', 33.4299)
.get()
Image for post
All documents returned from our long > 9.4989 AND long < 33.4299 query.

That’s all for this introduction to Google’s Firestore. For easy, secure, and robust cloud-based databases Firestore truly is fantastic.

Of course, there’s a lot more to the cloud than data storage alone. AWS and Azure are both great too, but the ease of use with GCP’s Firebase configuration is simply unmatched to anything else out there.

With very little time, it is effortless to learn everything we need to build a full application from the front-end UI to our data storage setup.

If the cloud is somewhat new to you, don’t hesitate to jump in — it is simply too valuable a skill to miss.

If you have any suggestions or would like to talk with me about Firestore — or getting started with the cloud — feel free to reach out to me on Twitter or in the comments below.

Thanks for reading!


Interested in learning more about the other side of this? Feel free to read my introduction to Angular — a fantastic and surprisingly simple front-end framework:

No comments:

Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...