One of the key features of being a data analyst is to query the data from files, databases etc to perform some data manipulation or visualisation and sometimes it's much better if we can do it directly through code instead of looking into the database tables over and over again. Pandas is a python library which can store query results in variables called "dataframes" and it helps us to perform data manipulations, visualisation and can convert our results back in the databases or files as write operations.
This blog basically describes how one can use 10 basic SQL operations using the pandas library.
Dataset which we'll be using as an example :
We will use two datasets :
- Marks of various students in 3 subjects. (Primary keys : class and roll_no)
- Personal details of students. (Primary keys : class and roll_no)
1. Selecting the data
- In sql we can select the data of various columns using this querry :
SELECT name,roll_no,phone FROM student_details;
- This is how it's done in pandas :
2. Using aggregate functions
- Now let's find the maximum marks a student of class 7th got in maths
SELECT max(maths) FROM student_marks WHERE class=7;
- Pandas has several functions like min(),max(),mean() etc that works on a column, which one can use to achieve the above operation
3. Order By clause
This clause in SQL is used for sorting the values in ascending or descending order Query to sort names in ascending order :
select * from student_details order by name;
Query to sort names in descending order :
select * from student_details order by name desc;
In pandas we can do the same with df.sort_values() function This function takes in 'column_to_be_sorted', ascending = True for ascending order and ascending = False for descending order sorting.
4. Group By clause
- Let's count the number of students in different classes
select class,count(*) from student_details group by class;
- In pandas we can do the same with df.groupby() function
5. IN and NOT IN
Let's select only those rows where blood group is A or B
select * from student_details where blood_group in ('A', 'B') ;
In pandas we can do the same with isin() which works on a particular colum
- For NOT IN we can just add a negation symbol "~" in the pandas condition :
6. Joins
Join statement to get all the details and marks of students together
SELECT * from student_details a JOIN student_marks b ON (a.roll_no=b.roll_no AND a.class=b.class);
In pandas we can do the same using merge() function and we can specify the type of join like inner,outer,left,right as well.
7. Creating new column using existing ones
Now, let's add a new column total which adds marks in 3 subjects for each student
SELECT class,roll_no,science,maths,english,science+maths+english as total FROM student_marks;
Just add df['new_column'] and fill it's value using other columns or with their operations
8. Selecting data conditionally
Let's try filtering the data using multiple conditions.
select * from student_details where class=7 and gender='M' and blood_group='O';
We can provide multiple conditions in braces () followed by AND (&) or OR (|)
9. Insert and Update
Inserting a new row in student_details table :
insert into student_details(name,roll_no,class,gender,blood_group,phone) values ('Seth',38,7,'M','AB',223344);
In pandas we can do the same by adding the row at the end of the dataframe
Updating a row in student_details table :
UPDATE student_details SET name = 'Seth Andrews' WHERE class=7 AND roll_no=38;
In pandas, we can update the value conditionally like this :
10. Delete
Deleting a row in student_details table :
DELETE FROM student_details WHERE name = 'Seth Andrews' and class = 7;
In pandas we can do the same with the help of drop() function :
Conclusion
This is just introduction to how one can execute basic SQL operation using pandas and there are many more operations of SQL which can be easily done using pandas. The datasets and python file I have used can be found here : Python Notebook I'll be posting much more content of python and data science in my coming blogs.
Thanks for your time! :)
No comments:
Post a Comment