Python asyncio: Future, Task and the Event Loop

Event Loop

On any platform, when we want to do something asynchronously, it usually involves an event loop. An event loop is a loop that can register tasks to be executed, execute them, delay or even cancel them and handle different events related to these operations. Generally, we schedule multiple async functions to the event loop. The loop runs one function, while that function waits for IO, it pauses it and runs another. When the first function completes IO, it is resumed. Thus two or more functions can co-operatively run together. This the main goal of an event loop.

The event loop can also pass resource intensive functions to a thread pool for processing. The internals of the event loop is quite complex and we don’t need to worry much about it right away. We just need to remember that the event loop is the mechanism through which we can schedule our async functions and get them executed.

Futures / Tasks

If you are into Javascript too, you probably know about Promise. In Python we have similar concepts – Future/Task. A Future is an object that is supposed to have a result in the future. A Task is a subclass of Future that wraps a coroutine. When the coroutine finishes, the result of the Task is realized.


We discussed Coroutines in our last blog post. It’s a way of pausing a function and returning a series of values periodically. A coroutine can pause the execution of the function by using the yield yield from or await (python 3.5+) keywords in an expression. The function is paused until the yield statement actually gets a value.

Fitting Event Loop and Future/Task Together

It’s simple. We need an event loop and we need to register our future/task objects with the event loop. The loop will schedule and run them. We can add callbacks to our future/task objects so that we can be notified when a future has it’s results.

Very often we choose to use coroutines for our work. We wrap a coroutine in Future and get a Task object. When a coroutine yields, it is paused. When it has a value, it is resumed. When it returns, the Task has completed and gets a value. Any associated callback is run. If the coroutine raises an exception, the Task fails and not resolved.

So let’s move ahead and see example codes.

As you can see already:

  • @asyncio.coroutine gets us the default event loop
  • loop.create_task(slow_operation()) creates a task from the coroutine returned by slow_operation()
  • task.add_done_callback(got_result) adds a callback to our task
  • loop.run_until_complete(task) runs the event loop until the task is realized. As soon as it has value, the loop terminates

The run_until_complete function is a nice way to manage the loop. Of course we could do this:

Here we make the loop run forever and from our callback, we explicitly shut it down when the future has resolved.

Python: Generators, Coroutines, Native Coroutines and async/await

NOTE: This post discusses features which were mostly introduced after Python 3.4. And the native coroutines and async/await syntax came in Python 3.5. So I recommend you to use Python 3.5 to try the codes.


Generators are functions that generates values. A function usually returns a value and then the underlying scope is destroyed. When we call again, the function is started from scratch. It’s one time execution. But a generator can yield a value and pause the execution of the function. The control is returned to the calling scope. Then we can again resume the execution when we want and get another value (if any). Let’s look at this example:

Please notice, a generator doesn’t return any values but it does return a generator object which is like an iterable. So we can call next() on a generator object to iterate over the values. Or run a for loop.

So how’s generators useful? Let’s say your boss has asked you to write a function to generate a sequence of number up to 100 (a super secret simplified version of range()). You wrote it. You took an empty list and kept adding the numbers to it and then returned it. But then the requirement changes and it needs to generate up to 10 million numbers. If you store these numbers in a list, you will run out of memory. So generators come into aid. You can generate these numbers without storing them in a list. Just like this:

We didn’t dare run after the number hit 10. But if you try it on console, you will see how it keeps generating numbers one after one. And it does so by pausing the execution and resuming – back and forth into the function context.

Summary: A generator is a function that can pause execution and generate multiple values instead of just returning one value. A generator function returns a generator object which acts like an iterable which we can use to get the values.


In the last section we have seen that using generators we can pull data from a function context (and pause execution). What if we wanted to push some data too? That’s where coroutines comes into play. The yield keyword we use to pull values can also be used as an expression (on the right side of “=”) inside the function. We can use the send() method on a generator to pass values back in. This is called “generator based coroutines”. Here’s an example:

OK, so what’s happening here? We first take the value as usual – using the next() function. This comes to yield "Hello" and we get “Hello”. Then we send in a value using the send() method. It resumes the function and assigns the value we send to hello and moves on up to the next line and executes the statement. So we get “World” as a return value of the send() method.

When we’re using generator based coroutines, by “generator” and “coroutine” we usually mean the same thing. Though they are not exactly the same thing, it is very often used interchangeably in such cases. However, with Python 3.5 we have async/await keywords along with native coroutines. So we will get to that.

Async I/O and the asyncio module

From Python 3.4, we have the new asyncio module which provides nice APIs for general async programming. We can use coroutines with the asyncio module to easily do async io. Here’s an example from the official docs:

The code is pretty self explanatory. We create a coroutine display_date(num, loop) which takes an identifier (number) and an event loop and continues to print the current time. Then it used the yield from keyword to await results from asyncio.sleep() function call. The function is a coroutine which completes after a given seconds. So we pass random seconds to it. Then we use asyncio.ensure_future() to schedule the execution of the coroutine in the default event loop. Then we ask the loop to keep running.

If we see the output, we shall see that the two coroutines are run concurrently. When we use yield from, the event loop knows that it’s going to be busy for a while so it pauses execution of the coroutine and runs another. Thus two coroutines run concurrently (but not in parallel since the event loop is single threaded).

Just so you know, yield from is a nice syntactic sugar for for x in asyncio.sleep(random.randint(0, 5)): yield x making async codes cleaner.

Native Coroutines and async/await

Remember, we’re still using generator based coroutines? In Python 3.5 we got the new native coroutines which uses the async/await syntax. The previous function can be written this way:

Take a look at the highlighted lines. We must define a coroutine with the async keyword before the def keyword. Inside a native coroutine, we can not use yield or yield from.

Native vs Generator Based Coroutines: Interoperability

Now that we have two types of coroutines, there’s an easy way to interoperate between those. We just need to add @types.coroutine decorator to old generator based ones. Then we can await from those. And we can also yield from native coroutines from inside those. Here’s an example:

Using ES7 async/await today with Babel

Let’s take a code snippet that contains the demonstration of async/await — – our objective is to transpile this piece of code to ES5 (current day Javascript) so we can run it with today’s version of NodeJS.

You will notice a command on top of the snippet which no longer works because Babel JS has changed. I am going to describe how we can do it with the latest version of babel as of this writing (6.1.4 (babel-core 6.1.4)).

Install Babel and Plugins

The new Babel depends on individual plugins to transform and parse codes. To transform async functions, we shall use the transform-regenerator plugin. We also need to add the syntax plugin to recognize the async/await syntax. Otherwise Babel won’t recognize those. Apart from that, we also install the ES2015 preset which includes a sane set of plugins for transforming ES6 to ES5. We will keep those so we can use other ES6 goodies.

First we install babel-cli globally:

Here’s our package.json file so you can just do npm install:

Configuring Babel

Here’s the .babelrc file I put in the same directory:

The file tells babel how to transform your code.

Transpile & Run

Once we have installed the dependencies, we can then start transpiling the codes to JS:

You might run into a problem like this:

It’s because we need to include the regenerator run time. The runtime is packed in the babel-polyfill we have already installed. We just need to include it in our source code. So the final github.es6 file would look like this:

Now if we transpile and run again, it should work fine.

কো-রিলেশন (Correlation), কজেশন (Causation) ও ফেইসবুকের গল্প

স্ট্যাটিস্টিক্স – ১০১ কিংবা বিজনেস রিসার্চ – ১০১ কোর্স গুলোতে একটা প্রশ্ন খুবই কমন – “কো-রিলেশনই কি কজেশন? উদাহরন সহ ব্যখ্যা করো” । কো-রিলেশন থেকে আমরা জানতে পারি দুটো ভ্যারিয়েবল এর মধ্যে কোন “কো-রিলেশনশিপ” আছে কিনা ।

মনে করি num_friends হচ্ছে একজন ব্যক্তির ফেইসবুক ফ্রেন্ডের সংখ্যা আর time_spent হচ্ছে সেই ব্যক্তি দিনে কতটুকু সময় ফেইসবুকে ব্যয় করে । একদিন সময় করে আমি বিজনেস এ্যাডমিনিস্ট্রেশন ডিসিপ্লিনের শিক্ষার্থীদের উপর জরিপ করে এই ডাটা কালেক্ট করি এবং কো-রিলেশন নির্নয় করি । দেখা গেলো এই ভ্যারিয়েবল দুটি পজিটিভলি কো-রিলেটেড । তার মানে যাদের ফেইসবুকে ফ্রেন্ড সংখ্যা বেশী, তারা ফেইসবুকে বেশী সময় কাটায় ।

সমস্যা হলো, এই ঘটনার পিছনে ৩ ধরনের কারন থাকতে পারে:

(১) যাদের ফ্রেন্ড সংখ্যা বেশী, তারা ফ্রেন্ডদের পোস্ট, ছবি, কমেন্ট পড়ার জন্য ফেইসবুকে বেশী সময় ব্যয় করে (“num_friends” causes “time_spent”)

(২) যারা ফেইসবুকে বেশী সময় কাটায় তারা ফটোগ্রাফি গ্রুপ, ফুডিজ গ্রুপ, নিজের ক্লাসের গ্রুপ, ডিসিপ্লিনের গ্রুপ – নানা গ্রুপে মতামত শেয়ার করে । এরফলে সমমনা অনেক মানুষের সাথে বন্ধুত্ব হয় । (“time_spent” causes “num_friends”)

(৩) এরা ফেইসবুকে বেশী সময় কাটান বয়ফ্রেন্ড বা গার্লফ্রেন্ডের সাথে চ্যাট করে আর ফ্রেন্ড এ্যাড করেন ডিসিপ্লিনের সিনিয়র জুনিয়র সবাইকে । (none causes the other)

এখন যেহেতু আমরা নিশ্চিত না এই ৩টি ঘটনার কোনটি ঘটছে । তাই কোরিলেশন মানেই যে কজেশন – এই ধারনা আমাদের নাকচ করে দিতে হয় । এবং এটা আমরা সুন্দর করে পরীক্ষার খাতায় ব্যখ্যা করে নাম্বার পাই । তাই এর বেশী আমাদের চিন্তা করা হয়ে উঠে না ।

অন্যদিকে ফেইসবুকের বেতনভুক্ত ডাটা সায়েন্টিস্টরা কিন্তু এত সহজে পার পান না । তাদের টার্গেট আমাদের স্টুডেন্টদেরকে বেশী সময় ফেইসবুকে আটকে রাখা, এজন্য এটা নিয়ে তাদের বিশাল মাথা ব্যথা । তারা জানতে চায় বিএ ডিসিপ্লিনের এই কো-রিলেশনের পিছনে আসল ঘটনা কি । তারা ডিসিপ্লিনের স্টুডেন্টদের থেকে র‍্যান্ডম সাবসেট নিয়ে তাদের উপর এক্সপেরিমেন্ট চালায় – তাদের নিউজফিড কাস্টোমাইজ করে ফেলে এবং ২ ধরনের টেস্ট চালায় –

(১) কিছু লোকজনকে ব্যাড এর নেটওয়ার্ক থেকে পোস্ট দেখানো কমিয়ে দেয় । কিছু লোকজনকে বেশী পোস্ট দেখানো শুরু করে ।

(২) কিছু লোকজনকে বেশী বেশী ব্যাড এর সিনিয়র, জুনিয়রদের ফ্রেন্ড রিকুয়েস্ট সাজেস্ট করতে থাকে, কিছু লোকজনকে কমিয়ে দেয় ।

যেহেতু এটি একটি কল্পিত ঘটনা সেহেতু এই টেস্টের ফলাফল আমার জানা নেই । তবে প্রথম এবং দ্বিতীয় টেস্ট থেকে কিছুটা “কনফিডেন্স” পাওয়া যেতে পারে । যদি প্রথম টেস্টে দেখা যায় যে বেশী পোস্ট দেখানোর কারনে লোকজন বেশী সময় কাটাচ্ছে, আর পোস্ট কমিয়ে দেওয়ার কারনে কম সময় কাটাচ্ছে তাহলে আমরা কিছুটা আইডিয়া পাই যে আসলে num_friends causes time_spent অন্যদিকে দ্বিতীয় টেস্ট থেকে আমরা এর উল্টোটা জানতে পারি । আর দুটো টেস্টেই যদি নেগেটিভ রেজাল্ট পাই, তাহলে ফেইসবুকের রিসার্চ টীমের কাজ আরও বেড়ে গেলো । ৩য় কি ঘটনা ঘটতে পারে এটা নিয়ে এখন তাদের চুলচেরা বিশ্লেষন করতে হবে ।

সহজ কথায়, কোরিলেশন নিশ্চিতভাবে কজেশন মিন করে না, তবে টোটাল পপুলেশনের র‍্যান্ডম সাবসেট নিয়ে কিছু টেস্ট করে কোরিলেশনের পাশাপাশি কজেশনও আছে কিনা সে ব্যাপারে আইডিয়া পাওয়া যেতে পারে ।

পুনশ্চ: আমার এই উদাহরন সম্পূর্ন বানোয়াট হলেও ফেইসবুক তার নিউজফিড নিয়ে এক্সেপেরিমেন্ট করেছে এবং সেটার ইথিক্যাল দিক নিয়ে আলোচনা সমালোচনাও হয়েছে অনেক ।


Django REST Framework: Custom Exception Handler

While using DRF, do you need to handle the exceptions yourselves? Here’s how to do it. First, we need to create an exception handler function like this:

In the exception handler, we get exc which is the exception raised and the context contains the context of the request. However, we didn’t do anything fancy with these. We just called the default exception handler and joined the errors in a flat string. We also added the exception to our response.

Now, we need to tell DRF to call our custom exception handler when it comes across an exception. That is easy:

Open up your and look for the REST_FRAMEWORK dictionary. Here you need to add the exception handler to the EXCEPTION_HANDLER key. So it should look something like this:

Django: Handling broken migrations

Often for one reason or another, migrations don’t apply properly and we have to fix them manually. If there’s a faulty migration that throws errors when run, we first have to identify what went wrong. Very often on MySQL, we see half applied migrations (probably because MySQL doesn’t have transactions?). In this case we need to do the other half ourselves.

We can easily connect to the database prompt by typing the dbshell command.

The command opens the shell for the respective database engine using the configurations provided in the settings file. That is you don’t have to remember the username or password for the database connection.

Now you have to fix the issues and then you can try running the migration again. In case it fails again, you should alter the database to match the state the migration would have created. For example, if the migration was supposed to alter a column from store_type to store_type_id (from a char field to foreign keys), you have to manually run the query, something like:

Then you have to fake the migration. When a migration is run, Django stores the name of the migration in a table. It helps track which migrations have already run and which needs to be run. When we fake a migration, Django stores the faked migration name in that table without actually running it. If we don’t do this, when we next run migrate command, Django will try to run this migration again and fail.

This is how we fake it:

You can also specify one particular migration when you have multiple migrations running.

Python: Writing custom log handler and formatter

Log Handlers dictate how the log entries are handled. For example FileHandler would allow us to pipe our logs to a file. HTTP Handler makes it possible to send the logs over HTTP to a remote server. We can write our own log handlers if we need to customize the way our logs are processed. Writing a custom handler is pretty simple. We have to subclass it from logging.Handler class and must define the emit method. This method is called with each log record so we can process it.

Here’s an exmaple of a custom log handler which POSTs the logs to a remote server using the popular requests library.

Now we have our custom handler that will post the logs to another server. What if we want to send the message in a specific format? We write our custom formatter. A formatter has a format method which gets the record. We can take the record and return a message formatted according our need. Here’s an example for logstash format:

So we have got ourselves a custom log handler and a formatter. But how do we use them? Here’s a short code snippet:

Django REST Framework: Displaying full URL for ImageField or FileField

If you have any ImageField or FileField in your model, you can easily display the full URL (including the hostname/domain) for the file/image. Django REST Framework’s model serializers would do it for you. However, to get the hostname/domain name, the serializer needs a “request context” so it can infer the necessary parts and build a full url.

So if you’re manually invoking a serializer, please pass a request context like this:

If you use ModelViewSet, DRF will automatically pass the request context while initializing the serializer. So in that case you don’t need to do anything. You need to pass the context only when you’re manually creating a serializer instance.

Django REST Framework: Dynamic Fields in Serializers

Here’s the use case, I need to add some extra fields to my serializer depending on a request. For example, if the query string contains “extra=true”, then add the extra fields. Luckily, serializers get a “context” argument when they are initialized. We can use this to customize our serializers as needed.

The fields defined on a serializer are available in a instance variable named “fields”. So we can add/delete/edit fields from it. Let’s see an example:

Please note, here we have used another serializer (UserLocationSerializer) from inside our main serializer. The second one is being initialized by us. So it would not get the context. If we need to do something down there as well, then we need to pass it ourselves.

Now the second serializer will get the request too and we can use the same way to customize it!

Docker Workflow of Python Workers

If you’re planning to use with Python, their new docker workflow is pretty cool. In this blog post, I shall walk you through the setup.

Setup Docker

The first step would be to install Docker. Here’s a pretty nice installation guide from Digital Ocean: :)

If you’re not using Ubuntu (eg. Windows or OS X), you might consider using Ubuntu in a virtualbox. I usually use Vagrant to do that.

If you’re using another Linux distro, please go through the official docker docs, I’m sure it’s quite easy :)

Once the setup is complete, run the following command to confirm that Docker has successfully installed and currently running:

You would see something like this:

Installing PIP Dependencies

Now it’s the time to install the dependencies for our projects. Change directory to our project root and use the requirements.txt file to install the packages. Please note we shall install them into a separate directory than the usual system path.

Here we have asked docker to run a new container based on the “iron/images:python-2.7” image. Inside the container we run the pip command. The “-t” flag would install packages in a separate target directory.

Since this is the first time we’re using this image, docker will need to pull it first. Then it will continue to install the dependencies. Once it’s done, feel free to examine the packages directory.

Locally Running Workers

The command is similar, except we have the packages installed in “packages” directory. So we need to set the PYTHONPATH env variable.

That would work :)

Bundling workers

To bundle the workers, we need to first zip everything:

You must remember to zip everything in, including the packages directory.

Now, we can start using the iron cli. In case you don’t have it installed, it’s easy:

Now let’s upload the zip file:

I think the command is pretty self explanatory. However, for the env argument, you need a file named “iron.json” which would contain the project id and tokens for your different environments. Here’s a sample:

Also note, we’re using PYTHONPATH=”packages” before invoking Python.

If everything goes right, your worker will be packaged and uploaded. Try running it to see how it goes.