Wednesday, November 13, 2019

Parallelism and Concurrency in Python

Parallelism and Concurrency in Python (Concepts + Code)

Hi Folks !! Hope you all programming geeks are doing well. In this post, we will discuss about concurrency and Parallelism in python. Here, we will look at Multithreading , Multiprocessing , asynchronous programming , concurrency and parallelism and how we can use these concepts to speed up computation tasks in python. So, without wasting time, lets get started .
[ I am already assuming that you have a fair knowledge of python. If not, I would recommend you to read this post before moving forward :

Parallelism

It means performing multiple tasks at same time and in same order .
1. Multiprocessing: It means distributing your tasks over CPU cores [ type <lscpu> in terminal to check the number of cores in your computer. ]. For any CPU bound tasks ( like — doing numerical computations ), we can use python’s multiprocessing module . We simply create a Pool object in multiprocessing which offers a convenient means to parallelize the execution of a function across multiple input values. Let's look at it with the help of an example :
import multiprocessing
import os 
import time 
import numpy as np

def DotProduct(A):
    dot_product = np.dot(A[0],A[1])   
    return

 List = [[np.arange(1000000).reshape(5000,200),np.arange(1000000).reshape(200,5000)],
         [np.arange(1000000).reshape(500,2000),np.arange(1000000).reshape(2000,500)],
         [np.arange(1000000).reshape(5000,200),np.arange(1000000).reshape(200,5000)]]
  
if __name__ == "__main__":    
    # executing a code without multiprocessing .. ie. on single core . 
    start = time.time()
    B = list(map(DotProduct,List))
    end = time.time() - start
    print("Full time taken : " , end , "seconds")
    
    # lets look at executing same code with multiprocesing module on multiple cores ..  
    start = time.time()
    pool = multiprocessing.cpu_count() 
    with multiprocessing.Pool(pool) as p:
        print(p.map(DotProduct,List))
    end = time.time() - start
    print("Full time taken : " , end , "seconds")   
## output //
Full time taken : 23.593358993530273 seconds
Full time taken : 14.405884027481079 seconds

Concurrency

It means performing multiple tasks at same time but in overlapping or different or same order . (Python is not great at handling concurrency ) but it does a pretty decent job .
1. Multithreading : running different/multiple threads to perform tasks on a single processor . Multithreading is really good for performing IO bound tasks (like — Sending multiple request to servers concurrently etc ..). Every new thread created will have a PID (process ID) and it will have a start function . join() function of the thread can be used, if want to run loc after thread finishes its job. Python has a very complicated relationship with its GIL and the output of the code vary a lot .
2. Async IO : In Python, Async IO is a single threaded - single process design paradigm that somehow manages to achieve concurrency .
Lets look at it with the help of an example .
import threading
import os 
import time 
import numpy as np

def BasicOperation():
    # square of number 
    def square(number):
        return number*number
    # cube of a number 
    def cube(number):
        return number**3
    # nth power of a number 
    def nth_power(number,power):
        return number**power
    # sum of n numbers 
    def sum_of_n_numbers(number):
        return number*(number+1)/2  
    # using functions to drive a program ... 
    print("square of 5 is " , square(5))
    print("cube of 5 is " , cube(5))
    print("5 raise to power 2 is " , nth_power(5,2))
    print("sum of first 5 numbers is" , sum_of_n_numbers(5))
    
def DotProduct():
    A = np.arange(1000000).reshape(5000,200)
    B = np.arange(1000000).reshape(200,5000)
    Dot = np.dot(A,B)

if __name__ == "__main__":      
        # without threading ... 
        start = time.time()
        BasicOperation()
        Mid = time.time() - start
        print("Mid time taken : " , Mid , "seconds")
        DotProduct()
        end = time.time() - start
        print("Full time taken : " , end , "seconds")
        # with threading ... 
        start = time.time()
        Thread_1 = threading.Thread(target = BasicOperation, name = ' Basic Operation Thread ') 
        Thread_2 = threading.Thread(target = DotProduct , name=' Dot Product Thread ')
        Thread_1.start() 
        Thread_2.start() 
        Thread_1.join() 
        Mid = time.time() - start
        print("Mid time taken : " , Mid , "seconds") 
        Thread_2.join()
        end = time.time() - start
        print("Full time taken : " , end , "seconds")
## output //
square of 5 is 25
cube of 5 is 125
5 raise to power 2 is 25
sum of first 5 numbers is 15.0
Mid time taken : 0.0006113052368164062 seconds
Full time taken : square of 5 is 10.373110294342041 25seconds

cube of 5 is Mid time taken : 1250.0015938282012939453
5 raise to power 2 is seconds
25
sum of first 5 numbers is 15.0
Full time taken : 12.598262786865234 seconds

Summary

We use python’s multiprocessing module to achieve parallelism whereas concurrency in Python is achieved with the help of threading and Async IO modules . A program running in parallel will be called as concurrent but the reverse is not true .
That's it. Thank you for taking your time and reading my post. I hope you liked it.

Tags

No comments:

Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...