From Redshift to Google Spreadsheet Using Python
A streamlined workflow to read, write and, format the spreadsheets
Recently I tried to extract data from Amazon Redshift database and push it to a Google spreadsheet using a Python script. The purpose of the script is to read the data from the Amazon redshift database, apply some business rules, and write it to the google spreadsheet.
I have used the gspread library in my python script, which is nothing but the python API for google sheets. Using this, we can read, write and, format the spread sheets very easily.
To interact with Google API, the first step is to create a project in Google Development Console and enable the APIs.
- Create a project (click here) :
2. Give a project name :
3. Go to project dashboard and click on +ENABLE APIS AND SERVICES :
4. Search for Google Drive API, click on it and enable the same. Do the same process to enable the Google spreadsheet API :
5. Click on create credentials :
6. Select the parameters and click on What credentials do I need?
7. Enter a Service Account Name and select Role :
8. Now click on KEYS and create a new key with json format :
A JSON file will be downloaded. We will need this JSON file in our script. So rename that file as client_secretkey.json.
Now go to your Google Drive and create an Google Sheet and name it redshift-to-Google-Sheet. Copy client_email value from the JSON file you have downloaded above and share that Google Sheet to this client_email with edit permission.
At this point we have everything set up, now let’s write our Python script. To interact with spreadsheet using python script, we have to install gspread Python library. So open up the terminal and run the following commands. oauth2client is a python library for accessing resources protected by OAuth 2.0.
pip install gspread
pip install oauth2client
Now create a Python file and name it redshift_to_spradsheet.py. Copy and paste the following code in it.
import gspread
from oauth2client.service_account import ServiceAccountCredentialsscope = ["https://spreadsheets.google.com/feeds", "https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/drive.file", "https://www.googleapis.com/auth/drive"]credentials = ServiceAccountCredentials.from_json_keyfile_name('client_secretkey.json', scope)
client = gspread.authorize(credentials)
Now let’s add code to connect to redshift database. I have used sqlalchemy library which facilitates the communication between Python programs and databases.
To install sqlalchemy type the following :
pip install sqlalchemy
Now import sqlalchemy to your python script and make database connection using create_engine. After connection string, write select query(or any other DML) to fetch the data and store it into pandas data frame.
from sqlalchemy import create_engine
import pandas as pd# Connecting to postgresql
engine = create_engine('postgresql://user:password@host:5439/database')
data_frame = pd.read_sql('SELECT * FROM article_sales;', engine)
To write this data frame to google spreadsheet, we have to install df2gspread python library. To install df2gspread type the following :
pip install df2gspread
Now import df2gspread to your python script and create a function called write_googlesheet as below :
from df2gspread import df2gspread as d2gdef write_googlesheet(df, spreadsheet_key, sheet_title, starting_cell, overwrite):
d2g.upload(df, spreadsheet_key, sheet_title, credentials=credentials, col_names=overwrite, row_names=True, start_cell = starting_cell, clean=overwrite)
This d2g.upload function will upload given Pandas DataFrame to Google Drive and returns gspread Worksheet object. Here I have passed spreadsheet key from the url of spreadsheet, sheet title as spreadsheet name(ie. Sheet1, Sheet2). To have more knowledge of the function’s parameters please click here.
github link: redshift_spreadsheet.py
Now run the Python script and open redshift-to-Google-Sheet Google Sheet in your browser. You will see that your Google Sheet is updated with the content.
Conclusion
The purpose of this article to let people know about this type of implementation. There can be many different strategies but I find it less complex and easy to understand for naive developers.
Comments