Polyglot.Ninja() – My new initiative!

So here it goes again, I have started another new blog / site – http://polyglot.ninja/ – I am planning to create programming related contents there. Mostly learning guides, tutorials and who knows, may be some video tutorials too?

Why did I start another site? Well, for one I love that domain. Also being a Polyglot developer by heart, I feel the need to make the term “polyglot” more familiar among developers. I would also like to convey the idea that learning multiple programming languages is very important. It does make us better developers if we learn new ideas and concepts from other languages.

I hope to continue writing there. If you are interested, you can subscribe to the mailing list (I have the same list, so subscribing here would also get you updates from that site). I will keep you posted of the new contents I create there 🙂

Python: Using the `requests` module to download large files efficiently

If you use Python regularly, you might have come across the wonderful requests library. I use it almost everyday to read urls or make POST requests. In this post, we shall see how we can download a large file using the requests module with low memory consumption.

To Stream or Not to Stream

When downloading large files/data, we probably would prefer the streaming mode while making the get call. If we use the stream parameter and set it to True, the download will not immediately start. The file download will start when we try to access the content property or try to iterate over the content using iter_content / iter_lines.

If we set stream to False, all the content is downloaded immediately and put into memory. If the file size is large, this can soon cause issues with higher memory consumption. On the other hand – if we set stream to True, the content is not downloaded, but the headers are downloaded and the connection is kept open. We can now choose to proceed downloading the file or simply cancel it.

But we must also remember that if we decide to stream the file, the connection will remain open and can not go back to the connection pool. If we’re working with many large files, these might lead to some efficiency. So we should carefully choose where we should stream. And we should take proper care to close the connections and dispose any unused resources in such scenarios.

Iterating The Content

By setting the stream parameter, we have delayed the download and avoided taking up large chunks of memory. The headers have been downloaded but the body of the file still awaits retrieval. We can now get the data by accessing the content property or choosing to iterate over the content. Accessing the content directly would read the entire response data to memory at once. That is a scenario we want to avoid when our target file is quite large.

So we are left with the choice to iterate over the content. We can use iter_content where the content would be read chunk by chunk. Or we can use iter_lines where the content would be read line by line. Either way, the entire file will not be loaded into memory and keep the memory usage down.

Code Example

The code should be self explanatory. We are opening the url with stream set to True. And then we are opening a file handle to the target_path (where we want to save our file). Then we iterate over the content, chunk by chunk and write the data to the file.

That’s it!

A Brief Introduction to Django Channels

There’s a new updated version of this article here: http://masnun.rocks/2016/09/25/introduction-to-django-channels/

Django has long been an excellent web framework. It has helped many developers and numerous businesses succeed over the years. But before the introduction of Channels, Django only supported the http protocol well. With the gradual evolution of the web technologies, standing here in 2016, supporting http only is simply not enough. Today, we are using websockets for real time communications, WebRTC is getting popular for real time collaboration or video calling, HTTP/2 is also being adapted by many. In the current state of the web, any modern web framework needs to be able to support more and more protocols. This is where Django Channels come into play. Channels aim at adding new capabilities to Django, including the support for modern web technologies like websockets or http2.

How does “Channels” work?

The idea behind Channels is quite simple. To understand the concept, let’s first walk through an example scenario, let’s see how Channels would process a request.

A http/websocket request hits the reverse proxy (ie, nginx). This step is not compulsory but we’re conscious developers and always make sure our requests first go through a hardened, battle proven reverse proxy before it hits our application server

Nginx passes the request to an application server. Since we’re dealing with multiple protocols now, instead of application server, let’s call it “Interface Server”. This interface server knows how to handle requests using different protocols. The interface server accepts the request and transforms into a message. It then passes the message on to a channel.

We have to write consumers which will listen on to specific channels. When new messages arrive on those channels, the consumers would process them and if needed, send a response back to a reply/response channel. The interface server listens on to these response channels and when we write back to these channels, the interface server reads the message and transmits it to the outside world (in this case, our user). The consumers are run in background worker processes. We can spawn as many workers as we like to scale up.

So as you can see, the concept is really simple – an interface server accepts requests and queues them as messages on channels. Consumers process these queues and write back responses on response channels. The interface server sends back the responses. Plain, simple yet effective!

There are channels which are already available for us. For example – http.request channel can be listened on if we want to handle incoming http messages. Or websocket.receive can be used to process incoming websocket messages. In reality, we would probably be less interested in handling http.request ourselves and rather let Django handle it. We would be more interested in adding our custom logic for websocket connections or other protocols. Besides the channels which are already available, we can also create our own custom channels for different purposes. Since the project works by passing messages to channels and handling them with background workers, we can actually use it for managing our background tasks too. For example, instead of generating thumbnails on the fly, we can pass the image information as a message to a channel and the worker does the thumbnailing in the background. By default Channels ship with a management command – runworker which can run background workers to listen to the channels. However, till now, there is no retry mechanism if the message delivery somehow fails. In this regard, Celery can be an excellent choice for writing / running / managing the background workers which would process these channels.

Daphne is now the de-facto interface server that works well with Channels. The channels and message passing work through a “channel layer” which support multiple backends. The popular ones are – In Memory, Redis, IPC. As you can guess, these backends and the channel layer is used to abstract away the process of maintaining different channels/queues and allowing workers to listen to those. In Memory backend maintains the channels in memory and is a good fit for local development. While a Redis cluster would be more suitable in a production environment for scaling up.

Let’s Build a WebSocket Echo Server

Enough talk. Let’s build a simple echo server. But before we can do that, we first have to install the package.

That should install Django (as it’s a dependency of channels) and channels along with the necessary packages. Start a Django project with django-admin and create an app.

Now add channels to the INSTALLED_APPS list in your settings.py. For local development, we are fine with the in memory channel layer, so we need to put these lines in settings.py to define the default channel layer:

In the above code, please note the ROUTING key. As the value of this key, we have to pass the path to our channel routing. In my case, I have an app named realtime and there’s a module named routing.py which has the channel routing.

In the channel routing list, we define our routes which looks very similar to Django’s url patterns. When we receive a message through a websocket connection, the message is passed on to the websocket.receive channel. So we defined a consumer to consume messages from that channel. We also defined a path to indicate that websocket connections to /chat/ should be handled by this particular route. If we omit the path, the clients can connect to any url on the host and we can catch them all! But if we define a path, it helps us namespace things and in another cause which we will see later in this article.

And here’s the consumers.py:

The consumer is very basic. It retrieves the text we received via websocket and replies back. Note that the websocket content is available on the content attribute of the message. And the reply_channel is the response channel here (the interface server is listening on to this channel). Whatever we send to this channel is passed back to the websocket connection.

We have defined our channel layer, created our consumer and mapped a route to it. Now we just need to launch the interface server and the background workers (which run the consumers). In local environment, we can just run – python manage.py runserver as usual. Channels will make sure the interface server and the workers are running in the background. (But this should not be used in production, in production we must use Daphne separately and launch the workers individually. See here).

Once our dev server starts up, let’s open up the web app. If you haven’t added any django views, no worries, you should still see the “It Worked!” welcome page of Django and that should be fine for now. We need to test our websocket and we are smart enough to do that from the dev console. Open up your Chrome Devtools (or Firefox | Safari | any other browser’s dev tools) and navigate to the JS console. Paste the following JS code:

If everything worked, you should get an alert with the message we sent. Since we defined a path, the websocket connection works only on /chat/. Try modifying the JS code and send a message to some other url to see how they don’t work. Also remove the path from our route and see how you can catch all websocket messages from all the websocket connections regardless of which url they were connected to. Cool, no?

Our websocket example was very short and we just tried to demonstrate how things work in general. But Django Channels provide some really cool features to work with websockets. It integrates with the Django Auth system and authenticates the websocket users for you. Using the Group concept, it is very easy to create group chats or live blogs or any sort of real time communication in groups. Love Django’s generic views? We have generic consumers to help you get started fast. The channels docs is quite nice, I suggest you read through the docs and try the concepts.

Using our own channels

We can create our own channels and add consumers to them. Then we can simply add some messages to those channels by using the channel name. Like this:


Since Daphne and ASGI is still new, some people still prefer to handle their http requests via WSGI. In such cases, we can configure nginx to route the requests to different servers (wsgi / asgi) based on url, domain or upgrade header. In such cases, having the real time end points under particular namespace can help us easily configure nginx to send the requests under that namespace to Daphne while sending all others to wsgi.

NodeJS: Generator Based Workflow

This post assumes you already know about Promises, Generators and aims at focusing more on the co library and the generator based workflow it supports. If you are not very familiar with Promises or Generators, I would recommend you study them first.

Promises are nice

Promises can save us from the callback hell and allow us to write easy to understand codes. May be something like this:

Here we have a promisedOne function that returns a Promise. The promise is resolved after 3 seconds and the value is set to 1. For keeping it simple, we used setTimeout. We can imagine other use cases like network operations where promises can be handy instead of callbacks.

Generator Based Workflow

With the new ES2015 generators and a nice generator based workflow library like co (https://github.com/tj/co) , we can do better. Something like this:

What’s happening here? The co function takes a generator function and executes it. Whenever the generator function yields something, co checks if the object is one of the yieldables that it supports. In our case, it’s a Promise which is supported. So co takes the yielded promise, processes it and returns the results back into the generator. So we can grab the value from the yield expression. If the promise is rejected or any error occurs, it’s thrown back to the generator function as well. So we could catch the error.

co returns a Promise. So if the generator function returns any values for us, we can retrieve those using the promise APIs.

So basically, we yield from the yieldables, catch any errors and return the values. We don’t have to worry about how the generator is working behind the scene.

The co library also has an useful function – co.wrap. This function takes a generator function but instead of executing it directly, it returns a general function which returns a promise.

This can often come handy when we want to use co with other libraries/frameworks which don’t support generator based workflow. For example, here’s a Gist that demonstrates how to use generators with the HapiJS web framework – https://gist.github.com/grabbou/ead3e217a5e445929f14. The route handlers are written using generator function and then adapted using co.wrap for Hapi.

Building a Facebook Messenger Bot with Python

Facebook now has the Messenger Platform which allows us to build bots which can accept messages from users and respond to them. In this tutorial, we shall see how we can build a bot and add it to one of our pages so that the users can interact with the bot by sending messages to the page.

To get started, we have three requirements to fulfill:

  • We need a Facebook Page
  • We need a Facebook App
  • We need a webhook / callback URL to accept incoming messages

I am assuming you already have a Facebook Page. If you don’t, go ahead and create one. It’s very simple.

Creating and Configuring The Facebook App

(1) First, we create a generic facebook app. We need to provide the name, namespace, category, contact email. Simple and straightforward. This is how it looks for me:

Create a New FB App.

(2) Now we have to browse the “Add Product” section and add “Messenger”.

Add Messenger

(3) Generate access token for a Page you manage. A popup will open asking you for permissions. Grant the permission and you will soon see the access token for that page. Please take a note of this token. We shall use it later send messages to the users on behalf of the page.

Next, click the “Webhooks” section.

(4) Before we can setup a webhook, we need to setup an URL which is publicly accessible on the internet. The URL must have SSL (that is it needs to be https). To meet this requirement and set up a local dev environment, we setup a quick flask app on our local machine.

Install Flask from PyPi using pip:

Facebook will send a GET request to the callback URL we provide. The request will contain a custom secret we can add (while setting up the webhook) and a challenge code from Facebook. They expect us to output the challenge code to verify ourselves. To do so, we write a quick GET handler using Flask.

We run the local server using python server.py. The app will launch at port 5000 by default. Next we use ngrok to expose the server to the internet. ngrok is a fantastic tool and you should seriously give it a try for running and debugging webhooks/callback urls on your local machine.

With that command, we will get an address like https://ac433506.ngrok.io. Copy that url and paste it in the Webhook setup popup. Checkmark the events we’re interested in. I check them all. Then we input a secret, which our code doesn’t care about much. So just add anything you like. The popup now looks like this:

Click “Verify and Save”. If the verification succeeds, the popup will close and you will be back to the previous screen.

Select a Page again and click “Subscribe”. Now our app should be added to the page we selected. Please note, if we haven’t generated an access token for that page in the earlier step, the subscription will fail. So make sure we have an access token generated for that page.

Handling Messages

Now every time someone sends a message to the “Masnun” page, Facebook will make a POST request to our callback url. So we need to write a POST handler for that url. We also need respond back to the user using the Graph API. For that we would need to use the awesome requests module.

Here’s the code for accepting incoming messages and sending them a reply:

The code here accepts a message, retrieves the user id and the message content. It reverses the message and sends back to the user. For this we use the ACCESS_TOKEN we generated before hand. The incoming request must be responded with a status code 200 to acknowledge the message. Otherwise Facebook will try the message a few more times and then disable the webhook. So sending a http status code 200 is important. We just output “ok” to do so.

You can now send a message to your page and see if it responds correctly. Check out Flask’s and ngrok’s logs to debug any issues you might face.

You can download the sample code from here: https://github.com/masnun/fb-bot

Python: Metaclass explained

One of the key features of the Python language is that everything is an object. These objects are instances of classes.

But hey, classes are objects too, no? Yes, they are.

So the classes we define are of the type type. So meta! But how are the classes constructed from the type class? And also a moment ago, we saw that type is a function that returns the type of an object?

Yes, when we pass *just* an object to type, it returns us the type of that object. But if we pass it more details, it creates the class for us. Like this:

We can pass name, bases as tuple and the attributes as a dictionary to type and we get back a class. The class extends the provided bases and has the attributes we provided.

When we define a class like this:

It’s internally equivalent of MyClass = type("MyClass", (int,), {"name": "MyClass"}). Here type is the metaclass of MyClass.

Objects are instances of Classes and the Classes are instances of Metaclasses.

So basically, that’s the basic – we create objects from classes and then we create classes out of metaclasses. In Python, type is the default metaclass for all the classes but this can be customized as we need.

Metaclass Hook

So what if we do not want to use type as the metaclass of our classes? We want to customize the way our classes are created and we don’t have any good way of modifying how the type metaclass works. So how do we roll our own metaclass and use them?

The pretty obvious way is to use the MyClass = MyMetaClass(name, bases, attrs) approach. But there’s another way to hook in a custom metaclass for a class. In Python 2, classes could define a __metaclass__ method which would be responsible for creating the class. In Python 3, we pass the metaclass callable as a keyword based argument in the base class list:

This metaclass argument has to be a callable which takes the name, bases and attributes as it’s arguments and returns a class object instance. Please note, the metaclass argument itself does not need to be a metaclass as long as it is a factory like callable that creates classes out of metaclasses.

That is a very simple example of a function being used as a metaclass callable. Now let’s use classes.

Here, MetaClass is called with the arguments, which are in effect passed to it’s __new__ method and we get a class. We subclassed from type, so we didn’t need to provide our own implementation for the __new__ method. After __new__ is called, the __init__ method is called for initialization purposes. We added an extra attribute to the class in our overridden __init__ method.

In our function example, we directly used the type metaclass. So all the classes generated from that function would be of the type type. On the other hand, we extended type in our class based example. So the type of the generated classes would be our metaclass. So it’s beneficial to use the class based approach.

Use Cases

Let’s keep track of subclasses:

Or make a class final:

Django: Running management commands inside a Docker container

Okay, so we have dockerized our django app and we need to run a manage.py command for some task. How do we do that? Simple, we have to locate the container that runs the django app, login and then run the command.

Locate The Container

It’s very likely that our app uses multiple containers to compose the entire system. For exmaple, I have one container running MySQL, one container running Redis and another running the actual Django app. If we want to run manage.py commands, we have to login to the one that runs Django.

While our app is running, we can find the running docker containers using the docker ps command like this:

In my case, I am using Docker Compose and I know my Django app runs using the crawler_web image. So we note the name of the container. In the above example, that is – crawler_web_1.

Nice, now we know which container we have to login to.

Logging Into The Container

We use the name of the container to login to it, like this:

The command above will connect us to the container and land us on a bash shell. Now we’re ready to run our command.

Running the command

We cd into the directory if necessary and then run the management command.


  • docker ps to list running containers and locate the one
  • docker exec -it [container_name] bash to login to the bash shell on that container
  • cd to the django project and run python manage.py [command]

Django REST Framework: Remember to disable Web Browsable API in Production

So this is what happened – I built an url shortening service at work for internal use. It’s a very basic app – shortens urls and tracks clicks. Two models – URL and URLVisit. URL model contains the full url, slug for the short url, created time etc. URLVisit has information related to the click, like user IP, browser data, click time etc and a ForeignKey to URL as expected.

Two different apps were using this service, one from me, another from a different team. I kept the Web Browsable API so the developers from other teams can try it out easily and they were very happy about it. The only job of this app was url shortening so I didn’t bother building a different home page. When people requested the / page on the domain, I would redirect them directly to /api/.

Things were going really great initially. There was not very heavy load on the service. Roughly 50-100 requests per second. I would call that minimal load. The server also had decent hardware and was running on an EC2 instance from AWS. nginx was on the front while the app was run with uwsgi. Everything was so smooth until it happened. After a month and half, we started noticing very poor performance of the server. Sometimes it was taking up to 40 seconds to respond. I started investigating.

It took me some time to find out what actually happened. By the time it happened, we have shortened more than a million urls. So when someone was visiting /api/url-visit/ – the web browsable api was trying to render the html form. The form allows the user to choose one of the entries from the URL model inside a select (dropdown). Rendering that page was causing usages of 100% cpu and blocking / slowing down other requests. It’s not really DRF’s fault. If I tried to load a million of entries into a select like that, it would crash the app too.

Even worse – remember I added a redirect from the home page, directly to the /api/ url? Search engines (bots) started crawling the urls. As a result the app became extremely slow and often unavailable to nginx. I initially thought, I could stop the search engine crawls by adding some robots.txt or simply by adding authentication to the API. But developers from other teams would still time to time visit the API to try out things and then make the app non responsive. So I did what I had to – I disabled the web browsable API and added a separate documentation demonstrating the use of the API with curl, PHP and Python.

I added the following snippet in my production settings file to only enable JSONRenderer for the API:

Things have become pretty smooth afterwards. I can still enjoy the nice HTML interface locally where there are much fewer items. While on my production servers, there is no web browsable APIs to cause any bottlenecks.

Composition over Inheritance


If you know basic OOP, you know what Inheritance is. When one class extends another, the child class inherits the parent class and thus the child class has access to all the variables and methods on the parent class.

Here the MallardDuck extends Duck and inherits the speed class variable along with the fly method. We override the speedin the child class to suit our needs. When we call fly on the mallard duck, it uses the fly method inherited from the parent. If we run the above code, we will see the following output:

This is inheritance in a nutshell.


Let’s first see an example:

Here we’re not implementing the email sending functionality directly inside the EmailClient. Rather, we’re storing a type of email provider in the email_provider variable and delegating the responsibility of sending the email to this provider. When we have to send_email, we call the send method on the email_provider. Thus we’re composing the functionality of the EmailClient by sticking composable objects together. We can also swap out the email provider any time we want, by passing it a new provider to the set_provider method.

Composition over Inheritance

Let’s implement the above EmailClient using inheritance.

Here, we created a base class EmailClient which has the setup method. Then we extended the class to create GmailClient and YahooMailClient. Things got interesting when we wanted to start sending emails using Yahoo instead of Gmail. We had to create a new instance of YahooMailClient for that purpose. The initially created client was no longer useful for us since it only knows how to send emails through Gmail.

This is why composition is often favoured over inheritance. By delegating the responsibility to the different composable parts, we form loose coupling. We can swap out those components easily when needed. We can also inject them as dependencies using dependency injection. But with inheritance, things get tightly coupled and not easily swappable.