Database Connectors
Imagine you’re a data scientist in charge of building a model, and you need to connect to a database — a pretty common task for you. How would you do it?
The easiest way is probably something like this:
import cx_Oracle
import pandas as pdcnn = cx_Oracle.connect(conn_str)
data = pd.read_sql(sql, cnn.connector)
# or do something else
(We’ll use an Oracle connection throughout the post as an example, but this blog applies to all other databases as well!)
This little code snippet seems like an obvious choice, right? This approach might get the job done, but isn’t the best way to go about it.
Why? Well, hopefully we were instructed to close any pointers or connectors after using them. Database connections are a limited and relatively expensive resource. If you’re not careful, you may accidentally hit the maximum number of connections with only one running project, but dozens of connections from historical projects.
A more careful way to deal with database connectors might be:
import cx_Oracle
import pandas as pdcnn = cx_Oracle.connect(conn_str)
data = pd.read_sql(sql, cnn.connector)
# or do something else
cnn.close()
Seems solid, right? Well, it’s better, but not quite there.
Not sure why? Consider the following: what happens if something in the middle interrupts the script? With this approach, you could accidentally reach the maximum number of allowable connections if, say, several testing scripts have database operations fail during execution. (Some other relevant questions here: what if you changed the name of the engine/connector and forgot to change it subsequent lines of code? Or if you have multiple connectors at the same time, but forget about one of them?)
If you’re really cautious and want to solve for each of these cases, you may come up with something like this:
import cx_Oracle
import loggingtry:
cnn = cx_Oracle.connect(conn_str)
cnn.do_something()
except Exception:
cnn.rollback()
logging.error("Database connection error")
raise
else:
cnn.commit()
finally:
cnn.close()
Now this looks good. We employ a rollback
if anything goes wrong, commit
if things go right, and close
our connection in the end. Though this is certainly a significant improvement over our first approach, we’re still short of best practices.
Why? Well, sometimes you have to deal with multiple database connectors (in the data exploration phase, for example), and you’ll have to read from each of them on a regular basis. This might introduce some redundancy, which suggests that a reusable function might be a good fit for this scenario.
Context Managers & Decorators
The best practices discussed above remind me of Python file objects: simply initialize a connection and safely close it when you’re done, regardless of what happens in the middle. With this syntax, the file is properly closed after the block finishes, even if an exception is raised on the way. It’d be great if we could handle database connections in a comparable manner.
Though we can’t treat connecting to a database in exactly the same way, it turns out that we can use something similar known as a context manager:
import cx_Oracleclass oracle_connection(object):
"""oracle db connection""" def __init__(self, connection_string=conn_str):
self.connection_string = connection_string
self.connector = None def __enter__(self):
self.connector = cx_Oracle.connect(self.connection_string)
return self def __exit__(self, exc_type, exc_val, exc_tb):
if exc_tb is None:
self.connector.commit()
else:
self.connector.rollback()
self.connector.close()
Context managers are used for managing resources. They allow you to allocate and release resources precisely when you wish. With this particular context manager, we can call database connections like so:
import pandas as pdwith oracle_connection() as conn:
data = pd.read_sql(sql, conn.connector)
(Read more about context managers here. You can also make use of contextlib
in the Python standard library for more context manager fun!)
If you’re ready to put your code into production, you may need a number of distinct database connectors, and consequently find that using a context manager is overly wordy. A great alternative is a decorator, which is a magical piece of code that modifies functions as opposed to objects. See below for our earlier example reconfigured as a decorator:
import cx_Oracle
import loggingdef db_connector(func):
def with_connection_(*args,**kwargs):
conn_str = conn_str
cnn = cx_Oracle.connect(conn_str)
try:
rv = func(cnn, *args,**kwargs)
except Exception:
cnn.rollback()
logging.error("Database connection error")
raise
else:
cnn.commit()
finally:
cnn.close() return rv
return with_connection_
Decorators allow you to change or extend a function’s behavior without really changing the function itself. You just decorate a function with one line of code above its definition to give it a whole new capability.
When designing a decorator, you need to have an inner function and a decorator function. Here, db_connector
is the decorator function, and with_connection_
is the inner function that’s only defined within the decorator. When calling this decorator, the decorated function func
ultimately points to the decorator’s inner function, with_connection_
. This inner function then wraps the decorated function func
by doing the following:
- Accepting an arbitrary number of positional and keyword arguments, allowing the decorated function to operate normally:
with_connection_(*args,**kwargs)
- Passing the database connector to the decorated function:
func(cnn, *args, **kwargs)
- Making sure the database connector is handled properly by Python’s
try/except
syntax, like we talked about earlier in the post.
With this in place, when you want to alter an existing function to use our database connector, you can drop in a decorator just above it like so:
@db_connector
def do_some_job(cnn, arg1, arg2=None):
cur = cnn.cursor()
cur.execute(SQL, (arg1, arg2)) # or something elsedo_some_job(arg1, arg2)
The @
symbol syntax basically (though not literally) equates to the following:
do_some_job(arg1, arg2) = db_connector(do_some_job(arg1, arg2))
This decorator creates a database connection and passes it along to the function below it. It basically does the job of a context manager, but in an even simpler way. There are other really interesting ways to write decorators — see more here.
Handling Credentials
So now you know how to set up a safe, easy database connection using context managers, decorators, and other best practices. But we’re not done yet!
Here’s a question: in any of the code above, how would you input your username and password to connect to the database? (Or rather, in this case, give the credentials to the aforementioned conn_str
?)
One idea might be to have a .py
file with all necessary credentials. This is certainly better than hard-coding your credentials into a conn_str
variable in each script that uses a database connection. But hard-coding credentials, even if they’re in a separate file, is never recommended.
Imagine that, in the beginning of your project, you used your personal password as credentials for a database connection. When it comes time to open-source your code, you might forget about the early days, but GitHub’s commit history certainly didn’t! Even absent any security concerns, if someone maintaining your project has to pore through all your code/data dictionaries to find the relevant snippets in which to update credentials, they will not be thrilled with your work.
In short, hard-coding credentials is unwieldy and potentially dangerous.
On the other hand, there’s another way to store your database credentials gracefully and safely: use environment variables. Whether you’re using Windows or a Unix-based system, you can always set these variables and hide them within your system, and they’re definitely more confidential than using scripts or configuration files.
In addition, when you’re deploying into a container (e.g. Docker), you can easily set your environment variables and not worry about changing them between deployments. For example, when switching from a testing environment to a production environment, all the database configurations could be different, but you can change the configuration of the container without altering the underlying code. Check here if you need a hint on how to change environment variables in your OS.
Here are our major previous code snippets with references to environment variables (see the os.environ
bits):
import cx_Oracle
import logging
import osclass oracle_connection(object):
"""oracle db connection""" def __init__(self, connection_string=os.environ["CONN"]):
self.connection_string = connection_string
self.connector = None def __enter__(self):
self.connector = cx_Oracle.connect(self.connection_string)
return self def __exit__(self, exc_type, exc_val, exc_tb):
if exc_tb is None:
self.connector.commit()
else:
self.connector.rollback()
self.connector.close()
def db_connector(func):
def with_connection_(*args, **kwargs):
conn_str = os.environ["CONN"]
cnn = cx_Oracle.connect(conn_str)
try:
rv = func(cnn, *args, **kwargs)
except Exception:
cnn.rollback()
logging.error("Database connection error")
raise
else:
cnn.commit()
finally:
cnn.close() return rv
return with_connection_
With these simple steps, you will be able to produce Python database connectors that are extensible, reusable, and secure.
_________________________________________________________________
If you liked this blog post, check out more of our work, follow us on social media (Twitter, LinkedIn, and Facebook), or join us for our free monthly Academy webinars.
No comments:
Post a Comment