Categories
Django Python

A Brief Introduction to Django Channels

There’s a new updated version of this article here: http://masnun.rocks/2016/09/25/introduction-to-django-channels/


Django has long been an excellent web framework. It has helped many developers and numerous businesses succeed over the years. But before the introduction of Channels, Django only supported the http protocol well. With the gradual evolution of the web technologies, standing here in 2016, supporting http only is simply not enough. Today, we are using websockets for real time communications, WebRTC is getting popular for real time collaboration or video calling, HTTP/2 is also being adapted by many. In the current state of the web, any modern web framework needs to be able to support more and more protocols. This is where Django Channels come into play. Channels aim at adding new capabilities to Django, including the support for modern web technologies like websockets or http2.

How does “Channels” work?

The idea behind Channels is quite simple. To understand the concept, let’s first walk through an example scenario, let’s see how Channels would process a request.

A http/websocket request hits the reverse proxy (ie, nginx). This step is not compulsory but we’re conscious developers and always make sure our requests first go through a hardened, battle proven reverse proxy before it hits our application server

Nginx passes the request to an application server. Since we’re dealing with multiple protocols now, instead of application server, let’s call it “Interface Server”. This interface server knows how to handle requests using different protocols. The interface server accepts the request and transforms into a `message`. It then passes the message on to a `channel`.

We have to write consumers which will listen on to specific channels. When new messages arrive on those channels, the consumers would process them and if needed, send a response back to a `reply/response channel`. The interface server listens on to these `response channels` and when we write back to these channels, the interface server reads the message and transmits it to the outside world (in this case, our user). The consumers are run in background worker processes. We can spawn as many workers as we like to scale up.

So as you can see, the concept is really simple – an interface server accepts requests and queues them as `message`s on `channel`s. Consumers process these queues and write back responses on `response channel`s. The interface server sends back the responses. Plain, simple yet effective!

There are channels which are already available for us. For example – `http.request` channel can be listened on if we want to handle incoming http messages. Or `websocket.receive` can be used to process incoming websocket messages. In reality, we would probably be less interested in handling `http.request` ourselves and rather let Django handle it. We would be more interested in adding our custom logic for websocket connections or other protocols. Besides the channels which are already available, we can also create our own custom channels for different purposes. Since the project works by passing messages to channels and handling them with background workers, we can actually use it for managing our background tasks too. For example, instead of generating thumbnails on the fly, we can pass the image information as a message to a channel and the worker does the thumbnailing in the background. By default Channels ship with a management command – `runworker` which can run background workers to listen to the channels. However, till now, there is no retry mechanism if the message delivery somehow fails. In this regard, Celery can be an excellent choice for writing / running / managing the background workers which would process these channels.

Daphne is now the de-facto interface server that works well with Channels. The channels and message passing work through a “channel layer” which support multiple backends. The popular ones are – In Memory, Redis, IPC. As you can guess, these backends and the channel layer is used to abstract away the process of maintaining different channels/queues and allowing workers to listen to those. In Memory backend maintains the channels in memory and is a good fit for local development. While a Redis cluster would be more suitable in a production environment for scaling up.

Let’s Build a WebSocket Echo Server

Enough talk. Let’s build a simple echo server. But before we can do that, we first have to install the package.

pip install channels

That should install Django (as it’s a dependency of channels) and channels along with the necessary packages. Start a Django project with `django-admin` and create an app.

Now add `channels` to the `INSTALLED_APPS` list in your `settings.py`. For local development, we are fine with the in memory channel layer, so we need to put these lines in `settings.py` to define the default channel layer:

CHANNEL_LAYERS = {
    "default": {
        "BACKEND": "asgiref.inmemory.ChannelLayer",
        "ROUTING": "realtime.routing.channel_routing",
    },
}

In the above code, please note the `ROUTING` key. As the value of this key, we have to pass the path to our channel routing. In my case, I have an app named `realtime` and there’s a module named `routing.py` which has the channel routing.

from channels.routing import route
from .consumers import websocket_receive

channel_routing = [
    route("websocket.receive", websocket_receive, path=r"^/chat/"),
]

In the channel routing list, we define our `route`s which looks very similar to Django’s url patterns. When we receive a message through a websocket connection, the message is passed on to the `websocket.receive` channel. So we defined a `consumer` to consume messages from that channel. We also defined a `path` to indicate that websocket connections to `/chat/` should be handled by this particular route. If we omit the path, the clients can connect to any url on the host and we can catch them all! But if we define a path, it helps us namespace things and in another cause which we will see later in this article.

And here’s the `consumers.py`:

def websocket_receive(message):
    text = message.content.get('text')
    if text:
        message.reply_channel.send({"text": "You said: {}".format(text)})

The consumer is very basic. It retrieves the text we received via websocket and replies back. Note that the websocket content is available on the `content` attribute of the `message`. And the `reply_channel` is the response channel here (the interface server is listening on to this channel). Whatever we send to this channel is passed back to the websocket connection.

We have defined our channel layer, created our consumer and mapped a route to it. Now we just need to launch the interface server and the background workers (which run the consumers). In local environment, we can just run – `python manage.py runserver` as usual. Channels will make sure the interface server and the workers are running in the background. (But this should not be used in production, in production we must use Daphne separately and launch the workers individually. See here).

Once our dev server starts up, let’s open up the web app. If you haven’t added any django views, no worries, you should still see the “It Worked!” welcome page of Django and that should be fine for now. We need to test our websocket and we are smart enough to do that from the dev console. Open up your Chrome Devtools (or Firefox | Safari | any other browser’s dev tools) and navigate to the JS console. Paste the following JS code:

socket = new WebSocket("ws://" + window.location.host + "/chat/");
socket.onmessage = function(e) {
    alert(e.data);
}
socket.onopen = function() {
    socket.send("hello world");
}

If everything worked, you should get an alert with the message we sent. Since we defined a path, the websocket connection works only on `/chat/`. Try modifying the JS code and send a message to some other url to see how they don’t work. Also remove the `path` from our route and see how you can catch all websocket messages from all the websocket connections regardless of which url they were connected to. Cool, no?

Our websocket example was very short and we just tried to demonstrate how things work in general. But Django Channels provide some really cool features to work with websockets. It integrates with the Django Auth system and authenticates the websocket users for you. Using the `Group` concept, it is very easy to create group chats or live blogs or any sort of real time communication in groups. Love Django’s generic views? We have generic consumers to help you get started fast. The channels docs is quite nice, I suggest you read through the docs and try the concepts.

Using our own channels

We can create our own channels and add consumers to them. Then we can simply add some messages to those channels by using the channel name. Like this:

Channel("thumbnailer").send({
        "image_id": image.id
    })

WSGI or ASGI?

Since Daphne and ASGI is still new, some people still prefer to handle their http requests via WSGI. In such cases, we can configure nginx to route the requests to different servers (wsgi / asgi) based on url, domain or upgrade header. In such cases, having the real time end points under particular namespace can help us easily configure nginx to send the requests under that namespace to Daphne while sending all others to wsgi.

Categories
Django Python

Django: Running management commands inside a Docker container

Okay, so we have dockerized our django app and we need to run a `manage.py` command for some task. How do we do that? Simple, we have to locate the container that runs the django app, login and then run the command.

Locate The Container

It’s very likely that our app uses multiple containers to compose the entire system. For exmaple, I have one container running MySQL, one container running Redis and another running the actual Django app. If we want to run `manage.py` commands, we have to login to the one that runs Django.

While our app is running, we can find the running docker containers using the `docker ps` command like this:

$ docker ps
CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS              PORTS                    NAMES
308f40bba888        crawler_testscript   "/sbin/my_init"          31 hours ago        Up 3 seconds        5000/tcp                 crawler_testscript_1
3a5ccc872215        crawler_web          "bash run_web.sh"        31 hours ago        Up 4 seconds        0.0.0.0:8000->8000/tcp   crawler_web_1
14f0e260fb2c        redis:latest         "/entrypoint.sh redis"   31 hours ago        Up 4 seconds        0.0.0.0:6379->6379/tcp   crawler_redis_1
252a7092870d        mysql:latest         "/entrypoint.sh mysql"   31 hours ago        Up 4 seconds        0.0.0.0:3306->3306/tcp   crawler_mysql_1

In my case, I am using Docker Compose and I know my Django app runs using the `crawler_web` image. So we note the name of the container. In the above example, that is – `crawler_web_1`.

Nice, now we know which container we have to login to.

Logging Into The Container

We use the name of the container to login to it, like this:

docker exec -it crawler_web_1 bash

The command above will connect us to the container and land us on a bash shell. Now we’re ready to run our command.

Running the command

We `cd` into the directory if necessary and then run the management command.

cd /project
python manage.py <command>

Summary

  • `docker ps` to list running containers and locate the one
  • `docker exec -it [container_name] bash` to login to the bash shell on that container
  • `cd` to the django project and run `python manage.py [command]`
Categories
Django Python

Django REST Framework: Remember to disable Web Browsable API in Production

So this is what happened – I built an url shortening service at work for internal use. It’s a very basic app – shortens urls and tracks clicks. Two models – `URL` and `URLVisit`. `URL` model contains the full url, slug for the short url, created time etc. `URLVisit` has information related to the click, like user IP, browser data, click time etc and a `ForeignKey` to `URL` as expected.

Two different apps were using this service, one from me, another from a different team. I kept the Web Browsable API so the developers from other teams can try it out easily and they were very happy about it. The only job of this app was url shortening so I didn’t bother building a different home page. When people requested the `/` page on the domain, I would redirect them directly to `/api/`.

Things were going really great initially. There was not very heavy load on the service. Roughly 50-100 requests per second. I would call that minimal load. The server also had decent hardware and was running on an EC2 instance from AWS. nginx was on the front while the app was run with uwsgi. Everything was so smooth until it happened. After a month and half, we started noticing very poor performance of the server. Sometimes it was taking up to 40 seconds to respond. I started investigating.

It took me some time to find out what actually happened. By the time it happened, we have shortened more than a million urls. So when someone was visiting `/api/url-visit/` – the web browsable api was trying to render the html form. The form allows the user to choose one of the entries from the `URL` model inside a select (dropdown). Rendering that page was causing usages of 100% cpu and blocking / slowing down other requests. It’s not really DRF’s fault. If I tried to load a million of entries into a select like that, it would crash the app too.

Even worse – remember I added a redirect from the home page, directly to the `/api/` url? Search engines (bots) started crawling the urls. As a result the app became extremely slow and often unavailable to nginx. I initially thought, I could stop the search engine crawls by adding some robots.txt or simply by adding authentication to the API. But developers from other teams would still time to time visit the API to try out things and then make the app non responsive. So I did what I had to – I disabled the web browsable API and added a separate documentation demonstrating the use of the API with `curl`, PHP and Python.

I added the following snippet in my production settings file to only enable `JSONRenderer` for the API:

REST_FRAMEWORK = {
    'DEFAULT_RENDERER_CLASSES': (
        'rest_framework.renderers.JSONRenderer',
    )
}

Things have become pretty smooth afterwards. I can still enjoy the nice HTML interface locally where there are much fewer items. While on my production servers, there is no web browsable APIs to cause any bottlenecks.