Tuesday, May 26, 2020

Tricky Python I : Memory Management for Mutable & Immutable Objects

It’s hard not to fall in love with Python: its clean syntax and close resemblance to English makes coding in Python feel like composing an elegant prose. However, just when you think you are beginning to speak Python, things start to get a little strange:
>>> x = 8
>>> y = x
>>> x = 100
>>> y
8
Wait, what just happened? Shouldn’t y hold the value of 100, since we just assigned y the value of x?
The mystery can be solved by understanding how Python handles memory management for mutable and immutable objects. To do this, we need to first discuss the object-oriented nature of python.

OOP in Python

“Everything is an object”
What is an object? To understand this, we need to first discuss the concept of a class. Similar to a struct in C, a class is a bundle of data of different types and functions logically grouped together. An object is an instance of a particular class. We can think of class as a blueprint, a template based on which an object is created.
Guido van Rossum, who wrote Python, was very deliberate in ensuring all objects in Python were “first class.” In other words, list, string, integer, or function — “anything that could be named in the language” — is an object that belongs to the corresponding class, and should be treated indiscriminately (the discussion on this is beyond the scope of this post). For example, 3 is an integer object belonging to the integer class"I'm a string!" is a string object belonging to the string class, and so on.
As you can see, the names of these classes describe the data type of an object. We can find out what class an object belongs to using the built-in type() function:
>>> L = [1, 2, 3]
>>> type(L)
<class 'list'>
Note that another built-in function isinstance() also checks if an object belongs to a given class by returning a boolean value. The difference is that isinstance() checks subclasses in addition, while type() doesn’t.

Mutable vs. Immutable Objects

“Not all objects are created equal”
There are two kinds of objects in Python: Mutable objects and Immutable objects. The value of a mutable object can be modified in place after it’s creation, while the value of an immutable object cannot be changed.
  • Immutable Objectint, float, long, complex, string tuple, bool
  • Mutable Object: list, dict, set, byte array, user-defined classes

Is it the same object?

We can check the mutability of an object by attempting to modify it and see if it is still the same object. There are two ways to do this:
  • Using the built-in function id(): this function returns the unique identity of an object. In CPython implementation, id() returns the memory address of the object. No two objects have the same identity.
  • Using the is and is not operator: these identity operators evaluates whether or not the objects have the same identity. In other words, if they are the same object.
Let’s see what happens when we apply id() on an immutable object, an integer:
>>> a = 89
>>> id(a)
4434330504
>>> a = 89 + 1
>>> print(a)
90
>>> id(a)
4430689552  # this is different from before!
…and contrasting the result with a mutable object, a list:
>>> L = [1, 2, 3]
>>> id(L)
4430688016
>>> L += [4]
>>> print(L)
[1, 2, 3, 4]
>>> id(L)
4430688016    # this is the same as before! 
We see that when we attempt to modify an immutable object (integer in this case), Python simply gives us a different object instead. On the other hand, we are able to make changes to an mutable object (a list) and have it remain the same object throughout.
It’s important to distinguish the identity function id() and identity operator is from the comparison operator ==, which evaluates whether the values are equal. We’ll demonstrate the difference and use it to illustrate the different behaviors of mutable/immutable objects in the following section on Python Memory Management.

Python Memory Management

In C, when we assign a variable, we first declare it, thereby reserving a space in memory, and then store the value in the memory spot allocated. We can create another variable with the same value by repeating the process, ending up with two spots in memory, each with its own value that is equivalent to the other’s.
Python employs a different approach. Instead of storing values in the memory space reserved by the variable, Python has the variable refer to the value. Similar to pointers in C, variables in Python refer to values (or objects) stored somewhere in memory. In fact, all variable names in Python are said to be references to the values, some of which are front loaded by Python and therefore exist before the name references occur (more on this later). Python keeps an internal counter on how many references an object has. Once the counter goes to zero — meaning that no reference is made to the object — the garbage collector in Python removes the object , thus freeing up the memory.
One may like to take a look at this article written by Sreejith Kesavanwith that provide nice analogy and clear visuals in illustrating the differences in how C and Python approach variable assignments.

Making References to Values

Each time we create a variable that refers to an object, a new object is created.
For example:
>>> L1 = [1, 2, 3]
>>> L2 = [1, 2, 3]
>>> L1 == L2
True             # L1 and L2 have the same value
>>> L1 is L2
False            # L1 and L2 do not refer to the same object!
We can, however, have two variables refer to the same object through a process called “aliasing”: assigning one variable the value of the other variable. In other words, one variable now serves as an alias for the other, since both of them now refer to the same object.
Here is an example:
>>> L1 = [1, 2, 3]
>>> L2 = L1         # L2 now refers to the same object as L1
>>> L1 == L2
True
>>> L1 is L2
True
>>> L1.append(4)
>>> print(L2)   
[1, 2, 3, 4]
Since L1 and L2 both refer to the same object, modifying L1 results in the same change in L2.
The example at the beginning of this article should start to make sense:
>>> x = 8
>>> y = x         # y refers to the same object (number 8) as x
>>> x = 100       # x now refers to a different object (number 100), 
                    since integers are immutable
>>> y             # but y is still referring to the same object
8

Exceptions with Immutable Objects

While it is true that a new object is created each time we have a variable that makes reference to it, there are few notable exceptions:
  1. some strings
  2. Integers between -5 and 256 (inclusive)
  3. empty immutable containers (e.g. tuples)
These exceptions arise as a result of memory optimization in Python implementation. After all, if two variables refer to objects with the same value, why wasting memory creating a new object for the second variable? Why not simply have the second variable refer to the same object in memory ?
Let’s look at some examples, shall we?
  1. String Interning
>>> a = "python is cool!"
>>> b = "python is cool!"
>>> a is b
False
This should not be surprising, since it obeys the “new objects are created each time” rule.
>>> a = "python"
>>> b = "python"
>>> a is b
True   # a and b refer to the same object!
This is a result of string interning, which allows two variables to refer to the same string object. Python automatically does this, although the exact rules remain fuzzy. One can also forcibly intern strings by calling the intern() function. Guillo’s article provides an in-depth look into string interning.
2. Integer Caching
The Python implementation front loads an array of integers between -5 to 256. Hence, variables referring to an integer within the range would be pointing to the same object that already exists in memory:
>>> a = 256
>>> b = 256
>>> a is b
True
This is not the case if the object referred to is outside the range:
>>> a = 257
>>> b = 257
>>> a is b
False
3. Empty Immutable Objects
Let’s take a look at empty tuples, which are immutable:
>>> a = ()
>>> b = ()
>>> a is b
True  # a and b both refer to the same object in memory
However, for non-empty tuples, new objects are created, even though both objects have the same value:
>>> a = (1, )
>>> b = (1, )
>>> a == b
True
>>> a is b
False
One last thing before we finish…

The Tricky Case with Operators

We have seen in the earlier example that given a list L, we can modify it in place using L += [x], which is equivalent to L.append(x). But how about L = L + [x] ?
>>> L = [1, 2, 3]
>>> id(L)
4431388424
>>> L = L + [4]
>>> id(L)
4434330504     # L now refers to a different object from before!
Why does this happen?
The answer lies in the subtle difference behind the operators. The + operator calls the __add__ magic method (these are methods automatically called instead of having to be explicitly invoked), which does not modify either arguments. Hence, the expression L + [4] creates a new object with the value [1, 2, 3, 4], which L on the left hand side now refers to. On the other hand, the += operator calls __iadd__ that modifies the arguments in place.
Fewfff…that was a lot of information. In my next post, we will be discussing some surprising (and not-so-surprising) things about parameter passing in Python as a result of object mutability/immutability. Stay tuned!

No comments:

Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...