Subrat's Technical Blog: Build your own pandas (like) library

Friday, November 22, 2019

Build your own pandas (like) library

Apr 2 · 4 min read

I learnt pandas mainly through Ted Petrou’s Cookbook. And now, Ted has come up with another interesting project: Building our own pandas like library. It contains a series of steps that one needs to complete in order to build a fully-functioning library similar to pandas, called pandas_cub. The library will have the most useful features of pandas.

Things I love about the project

Size of the project: The project offers an amazing opportunity for people familiar with Python to code something big.
Setting up environments: The project helps you learn how to create a separate environment for your project. This is done using conda package manager.
Test driven development: You also learn test driven development which means you first write tests and then you write code to pass them. You will learn the python library pytest in the process.

The prerequisites and setups can be found on the project’s Github page but I’d still provide a quick walk through for the same.

Step 1: Fork the repository

Git and GitHub introductory courses in case you are not familiar with them.

Step 2: Clone your local copy

Right. Now we can fill in our code and push it to our copy of the project. However, what if the creator of the project modifies the original project. We want those modifications in our forked version as well. For this, we can add a remote called upstream to the original repository and pull from there whenever we want to sync the two. However, this is not required for this project.

Step 3: Environment setup

This means downloading the specific set of libraries and tools required to build this project. The project has a file called environment.yml which lists all these libraries. A simple conda env create -f environment.yml will install them for us. I have already installed them.

We can now use conda activate pandas_cub and conda deactivate to activate and deactivate our environment.

Step 4: Checking tests

All the tests are included in a file called test_dataframe.py located in the tests directory.

Run all the tests:

$ pytest tests/test_dataframe.py

Run a particular class of tests:

$ pytest tests/test_dataframe.py::ClassName

Run a particular function of that class:

$ pytest tests/test_dataframe.py::ClassName::FuncName

Finally, you need to make sure Jupyter is running in the right environment.

Step 5: __init.py__

Once everything is set up, let’s start by inspecting __init.py__ which is the file we will be editing throughout the project. The first task requires us to check if the input provided by the user is in the correct format.

Let’s start by raising a TypeError if data is not a dictionary. We do this as follows:

To test this case we will open a new Jupyter notebook (you can also use the existing Test Notebook) and do the following:

Note: Make sure you have these magic lines of code on top of your notebook. They will ensure that whenever you edit the library code, it will reflect in your notebook without having to restart.

Now let’s pass a dictionary and see if it works.

And sure it does. Once we code all the cases we can run the tests from the test_dataframe.py to see if all of them pass. Overall, this seems like a really fun project and provides a lot to learn. Go through it and tell me if you like it. Also tell me other fun projects you’ve worked on.

~happy learning.

Subrat's Technical Blog

Friday, November 22, 2019

Build your own pandas (like) library

Build your own pandas (like) library

Things I love about the project

Towards Data Science

Sharing concepts, ideas, and codes.

No comments:

Must Watch YouTube Videos for Databricks Platform Administrators

Report Abuse

Friday, November 22, 2019

Build your own pandas (like) library

Build your own pandas (like) library

Things I love about the project

Towards Data Science

Sharing concepts, ideas, and codes.

53

53 claps

No comments:

Must Watch YouTube Videos for Databricks Platform Administrators