Parallelism and Concurrency in Python (Concepts + Code)
Hi Folks !! Hope you all programming geeks are doing well. In this post, we will discuss about concurrency and Parallelism in python. Here, we will look at Multithreading , Multiprocessing , asynchronous programming , concurrency and parallelism and how we can use these concepts to speed up computation tasks in python. So, without wasting time, lets get started .
[ I am already assuming that you have a fair knowledge of python. If not, I would recommend you to read this post before moving forward :
Parallelism
It means performing multiple tasks at same time and in same order .
1. Multiprocessing: It means distributing your tasks over CPU cores [ type <lscpu> in terminal to check the number of cores in your computer. ]. For any CPU bound tasks ( like — doing numerical computations ), we can use python’s multiprocessing module . We simply create a Pool object in multiprocessing which offers a convenient means to parallelize the execution of a function across multiple input values. Let's look at it with the help of an example :
import multiprocessing
import os
import time
import numpy as np
def DotProduct(A):
dot_product = np.dot(A[0],A[1])
return
List = [[np.arange(1000000).reshape(5000,200),np.arange(1000000).reshape(200,5000)],
[np.arange(1000000).reshape(500,2000),np.arange(1000000).reshape(2000,500)],
[np.arange(1000000).reshape(5000,200),np.arange(1000000).reshape(200,5000)]]
if __name__ == "__main__":
# executing a code without multiprocessing .. ie. on single core .
start = time.time()
B = list(map(DotProduct,List))
end = time.time() - start
print("Full time taken : " , end , "seconds")
# lets look at executing same code with multiprocesing module on multiple cores ..
start = time.time()
pool = multiprocessing.cpu_count()
with multiprocessing.Pool(pool) as p:
print(p.map(DotProduct,List))
end = time.time() - start
print("Full time taken : " , end , "seconds")
## output //
Full time taken : 23.593358993530273 seconds
Full time taken : 14.405884027481079 seconds
Full time taken : 23.593358993530273 seconds
Full time taken : 14.405884027481079 seconds
Concurrency
It means performing multiple tasks at same time but in overlapping or different or same order . (Python is not great at handling concurrency ) but it does a pretty decent job .
1. Multithreading : running different/multiple threads to perform tasks on a single processor . Multithreading is really good for performing IO bound tasks (like — Sending multiple request to servers concurrently etc ..). Every new thread created will have a PID (process ID) and it will have a start function . join() function of the thread can be used, if want to run loc after thread finishes its job. Python has a very complicated relationship with its GIL and the output of the code vary a lot .
2. Async IO : In Python, Async IO is a single threaded - single process design paradigm that somehow manages to achieve concurrency .
Lets look at it with the help of an example .
import threading
import os
import time
import numpy as np
def BasicOperation():
# square of number
def square(number):
return number*number
# cube of a number
def cube(number):
return number**3
# nth power of a number
def nth_power(number,power):
return number**power
# sum of n numbers
def sum_of_n_numbers(number):
return number*(number+1)/2
# using functions to drive a program ...
print("square of 5 is " , square(5))
print("cube of 5 is " , cube(5))
print("5 raise to power 2 is " , nth_power(5,2))
print("sum of first 5 numbers is" , sum_of_n_numbers(5))
def DotProduct():
A = np.arange(1000000).reshape(5000,200)
B = np.arange(1000000).reshape(200,5000)
Dot = np.dot(A,B)
if __name__ == "__main__":
# without threading ...
start = time.time()
BasicOperation()
Mid = time.time() - start
print("Mid time taken : " , Mid , "seconds")
DotProduct()
end = time.time() - start
print("Full time taken : " , end , "seconds")
# with threading ...
start = time.time()
Thread_1 = threading.Thread(target = BasicOperation, name = ' Basic Operation Thread ')
Thread_2 = threading.Thread(target = DotProduct , name=' Dot Product Thread ')
Thread_1.start()
Thread_2.start()
Thread_1.join()
Mid = time.time() - start
print("Mid time taken : " , Mid , "seconds")
Thread_2.join()
end = time.time() - start
print("Full time taken : " , end , "seconds")
## output //
square of 5 is 25
cube of 5 is 125
5 raise to power 2 is 25
sum of first 5 numbers is 15.0
Mid time taken : 0.0006113052368164062 seconds
Full time taken : square of 5 is 10.373110294342041 25seconds
cube of 5 is Mid time taken : 1250.0015938282012939453
5 raise to power 2 is seconds
25
sum of first 5 numbers is 15.0
Full time taken : 12.598262786865234 seconds
square of 5 is 25
cube of 5 is 125
5 raise to power 2 is 25
sum of first 5 numbers is 15.0
Mid time taken : 0.0006113052368164062 seconds
Full time taken : square of 5 is 10.373110294342041 25seconds
cube of 5 is Mid time taken : 1250.0015938282012939453
5 raise to power 2 is seconds
25
sum of first 5 numbers is 15.0
Full time taken : 12.598262786865234 seconds
Summary
We use python’s multiprocessing module to achieve parallelism whereas concurrency in Python is achieved with the help of threading and Async IO modules . A program running in parallel will be called as concurrent but the reverse is not true .
That's it. Thank you for taking your time and reading my post. I hope you liked it.
No comments:
Post a Comment