If you have written some Python code and used the for loop, you have already used iterators behind the scene but you probably didn’t know about it. Iterators are objects that we can iterate over one by one. They are practically everywhere in a Python codebase. Understanding the concepts of iterators and how they work can help us write better, more efficient code from time to time. In this post, we will discuss iterators and other related concepts.
How does iteration work?
Before we can dive into iterators, we first need to understand how iteration works in Python. When we do the for
loop, how does Python fetch one item at a time? How does this process work?
There are two functions that come into play – iter
and next
. The iter
function gets an iterator from an object. It actually calls the __iter__
special method on the object to get the iterator. So if an object wants to allow iteration, it has to implement the __iter__
method. Once it gets the iterator object, it continues to call next
on the iterator. The next
function in turn calls the __next__
method on the iterator object. Let’s see a quick example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
>>> l = [1, 2, 3] >>> i = iter(l) >>> type(l) <class 'list'> >>> type(i) <class 'list_iterator'> >>> next(i) 1 >>> next(i) 2 >>> next(i) 3 >>> next(i) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration >>> |
Let’s see. We first create a list named l
with 3 elements. We then call iter()
on it. The type of l
is list
but look at the type of i
– it’s list_iterator
– interesting! Now we keep calling next
on i
and it keeps giving us the values we saw in the list, one by one, until there’s a StopIteration
exception.
Here the list is an iterable because we can get an iterator from it to iterate over the list. The list_iterator
object we got is an iterator, it’s an object that we can actually iterate over. When we loop over a list, this is what happens:
1 2 3 4 5 6 7 8 9 10 |
l = [1, 2, 3] iterator = iter(l) while True: try: item = next(iterator) print(item) except StopIteration: break |
Makes sens? The for loop actually gets the iterator and keeps looping over until a StopIteration
exception is encountered.
Iterator
The iterator is an object which implements __next__
method so we can call next
on it repeatedly to get the items. Let’s write an iterator that keeps us giving us the next integer, without ever stopping. Let’s name it InfiniteIterator
.
1 2 3 4 5 6 7 |
class InfiniteIterator: def __init__(self): self.__int = 0 def __next__(self): self.__int += 1 return self.__int |
If we keep calling next
on it, we will keep getting the integers, starting from one.
1 2 3 4 5 6 7 8 9 10 |
>>> inf_iter = InfiniteIterator() >>> next(inf_iter) 1 >>> next(inf_iter) 2 >>> next(inf_iter) 3 >>> next(inf_iter) 4 >>> |
Iterable
What if we wanted to create an InfiniteNumbers
iterable? It would be such that when we use the for loop on it, it never stops. It keeps producing the next integer in each loop. What would we do? Well, we have an InfiniteIterator
. All we need is to define an __iter__
method that returns a new instance of InfiniteIterator
.
1 2 3 4 5 6 7 8 9 10 11 12 |
class InfiniteNumbers: def __iter__(self): return InfiniteIterator() infinite_numbers = InfiniteNumbers() for x in infinite_numbers: print(x) if x > 99: break |
If you remove the break
statement and the if block, you will notice, it keeps running – like forever.
Using StopIteration
Instead of breaking out from our code ourselves, we could use the StopIteration
exception in our iterator so it stops after giving us the 100 numbers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
class HundredIterator: def __init__(self): self.__int = 0 def __next__(self): if self.__int > 99: raise StopIteration self.__int += 1 return self.__int class InfiniteNumbers: def __iter__(self): return HundredIterator() one_hundred = InfiniteNumbers() for x in one_hundred: print(x) |
Iterators must also implement __iter__
We saw that the __next__
method does it’s work just fine. But we also need to implement the __iter__
method on an iterator (just like we did in iterable). Why is this required? Let me quote from the official docs:
Iterators are required to have an
__iter__()
method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted.
If we tried to use the for loop over our iterator, it would fail:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
class HundredIterator: def __init__(self): self.__int = 0 def __next__(self): if self.__int > 99: raise StopIteration self.__int += 1 return self.__int one_hundred = HundredIterator() for x in one_hundred: print(x) |
We will get the following exception:
1 2 3 4 |
Traceback (most recent call last): File "iter.py", line 15, in <module> for x in one_hundred: TypeError: 'HundredIterator' object is not iterable |
That kind of makes sense because we saw that the for loop runs the iter
function on an object to get an iterator from it. Then calls next
on the iterator. That’s the problem, we don’t have an __iter__
method. The official documentation suggests that every iterator should be a proper iterable too. That is, it should implement the __iter__
method and just return an instance of itself. Let’s do that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
class HundredIterator: def __init__(self): self.__int = 0 def __iter__(self): return self def __next__(self): if self.__int > 99: raise StopIteration self.__int += 1 return self.__int one_hundred = HundredIterator() for x in one_hundred: print(x) |
Now the code works fine 🙂
The Iterator Protocol
The iterator protocol defines the special methods that an object must implement to allow iteration. We can summarize the protocol in this way:
- Any object that can be iterated over needs to implement the
__iter__
method which should return an iterator object. Any object that returns an iterator is an iterable. - An iterator must implement the
__next__
method which returns the next item when called. When all items are exhausted (read retrieved), it must raise theStopIteration
exception. - An iterator must also implement the
__iter__
method to behave like an iterable.
Why do we need Iterables?
In our last example, we saw that it’s possible for an object to implement a __next__
method and an __iter__
method that returns self
. In this way, an iterator behaves just like an iterable alright. Then why do we need Iterables? Why can’t we just keep using Iterators which refer to itself?
Let’s get back to our HundredIterator
example. Once you have iterated over the items once, try to iterate again. What happens? No numbers are output on the screen. Why? Well, because the iterator objects store “state”. Once it has reached StopIteration
, it has reached the end line. It’s now exhausted. Every time you call iter
on it, it returns the same instace (self
) which has nothing more to output.
This is why Iterables are useful. You can just return a fresh instance of an iterator every time the iterable is looped over. This is actually what many built in types like list
does.
Why is Iterators so important?
Iterators allow us to consume data each item at a time. Just imagine, if there’s a one GB file and we tried to load it all in memory, it would require huge memory. But what if we implemented an iterator that reads the file one line at a time? We could then just store that one line in memory and do necessary processing before moving on to the next item. This allow us to write really efficient programs 🙂
This all seems very confusing
If you find the concepts very confusing and hard to grasp, don’t worry. Give it a few tries, write the codes by hand and see the output. Tinker with the examples. Inspect the code, try to see what happens when you modify part of it. All things become easier when you practise more and more. Try writing your own iterables and iterators – perhaps try to clone the built in containers’ functionalities? May be write your own list implementation? Don’t worry, it will come to you in time.