Building a Facebook Messenger Bot with Python

Facebook now has the Messenger Platform which allows us to build bots which can accept messages from users and respond to them. In this tutorial, we shall see how we can build a bot and add it to one of our pages so that the users can interact with the bot by sending messages to the page.

To get started, we have three requirements to fulfill:

  • We need a Facebook Page
  • We need a Facebook App
  • We need a webhook / callback URL to accept incoming messages

I am assuming you already have a Facebook Page. If you don’t, go ahead and create one. It’s very simple.

Creating and Configuring The Facebook App

(1) First, we create a generic facebook app. We need to provide the name, namespace, category, contact email. Simple and straightforward. This is how it looks for me:

Create a New FB App.

(2) Now we have to browse the “Add Product” section and add “Messenger”.

Add Messenger

(3) Generate access token for a Page you manage. A popup will open asking you for permissions. Grant the permission and you will soon see the access token for that page. Please take a note of this token. We shall use it later send messages to the users on behalf of the page.

Next, click the “Webhooks” section.

(4) Before we can setup a webhook, we need to setup an URL which is publicly accessible on the internet. The URL must have SSL (that is it needs to be https). To meet this requirement and set up a local dev environment, we setup a quick flask app on our local machine.

Install Flask from PyPi using pip:

Facebook will send a GET request to the callback URL we provide. The request will contain a custom secret we can add (while setting up the webhook) and a challenge code from Facebook. They expect us to output the challenge code to verify ourselves. To do so, we write a quick GET handler using Flask.

We run the local server using python server.py. The app will launch at port 5000 by default. Next we use ngrok to expose the server to the internet. ngrok is a fantastic tool and you should seriously give it a try for running and debugging webhooks/callback urls on your local machine.

With that command, we will get an address like https://ac433506.ngrok.io. Copy that url and paste it in the Webhook setup popup. Checkmark the events we’re interested in. I check them all. Then we input a secret, which our code doesn’t care about much. So just add anything you like. The popup now looks like this:

Click “Verify and Save”. If the verification succeeds, the popup will close and you will be back to the previous screen.

Select a Page again and click “Subscribe”. Now our app should be added to the page we selected. Please note, if we haven’t generated an access token for that page in the earlier step, the subscription will fail. So make sure we have an access token generated for that page.

Handling Messages

Now every time someone sends a message to the “Masnun” page, Facebook will make a POST request to our callback url. So we need to write a POST handler for that url. We also need respond back to the user using the Graph API. For that we would need to use the awesome requests module.

Here’s the code for accepting incoming messages and sending them a reply:

The code here accepts a message, retrieves the user id and the message content. It reverses the message and sends back to the user. For this we use the ACCESS_TOKEN we generated before hand. The incoming request must be responded with a status code 200 to acknowledge the message. Otherwise Facebook will try the message a few more times and then disable the webhook. So sending a http status code 200 is important. We just output “ok” to do so.

You can now send a message to your page and see if it responds correctly. Check out Flask’s and ngrok’s logs to debug any issues you might face.

You can download the sample code from here: https://github.com/masnun/fb-bot

Python: Metaclass explained

One of the key features of the Python language is that everything is an object. These objects are instances of classes.

But hey, classes are objects too, no? Yes, they are.

So the classes we define are of the type type. So meta! But how are the classes constructed from the type class? And also a moment ago, we saw that type is a function that returns the type of an object?

Yes, when we pass *just* an object to type, it returns us the type of that object. But if we pass it more details, it creates the class for us. Like this:

We can pass name, bases as tuple and the attributes as a dictionary to type and we get back a class. The class extends the provided bases and has the attributes we provided.

When we define a class like this:

It’s internally equivalent of MyClass = type("MyClass", (int,), {"name": "MyClass"}). Here type is the metaclass of MyClass.

Objects are instances of Classes and the Classes are instances of Metaclasses.

So basically, that’s the basic – we create objects from classes and then we create classes out of metaclasses. In Python, type is the default metaclass for all the classes but this can be customized as we need.

Metaclass Hook

So what if we do not want to use type as the metaclass of our classes? We want to customize the way our classes are created and we don’t have any good way of modifying how the type metaclass works. So how do we roll our own metaclass and use them?

The pretty obvious way is to use the MyClass = MyMetaClass(name, bases, attrs) approach. But there’s another way to hook in a custom metaclass for a class. In Python 2, classes could define a __metaclass__ method which would be responsible for creating the class. In Python 3, we pass the metaclass callable as a keyword based argument in the base class list:

This metaclass argument has to be a callable which takes the name, bases and attributes as it’s arguments and returns a class object instance. Please note, the metaclass argument itself does not need to be a metaclass as long as it is a factory like callable that creates classes out of metaclasses.

That is a very simple example of a function being used as a metaclass callable. Now let’s use classes.

Here, MetaClass is called with the arguments, which are in effect passed to it’s __new__ method and we get a class. We subclassed from type, so we didn’t need to provide our own implementation for the __new__ method. After __new__ is called, the __init__ method is called for initialization purposes. We added an extra attribute to the class in our overridden __init__ method.

In our function example, we directly used the type metaclass. So all the classes generated from that function would be of the type type. On the other hand, we extended type in our class based example. So the type of the generated classes would be our metaclass. So it’s beneficial to use the class based approach.

Use Cases

Let’s keep track of subclasses:

Or make a class final:

Django: Running management commands inside a Docker container

Okay, so we have dockerized our django app and we need to run a manage.py command for some task. How do we do that? Simple, we have to locate the container that runs the django app, login and then run the command.

Locate The Container

It’s very likely that our app uses multiple containers to compose the entire system. For exmaple, I have one container running MySQL, one container running Redis and another running the actual Django app. If we want to run manage.py commands, we have to login to the one that runs Django.

While our app is running, we can find the running docker containers using the docker ps command like this:

In my case, I am using Docker Compose and I know my Django app runs using the crawler_web image. So we note the name of the container. In the above example, that is – crawler_web_1.

Nice, now we know which container we have to login to.

Logging Into The Container

We use the name of the container to login to it, like this:

The command above will connect us to the container and land us on a bash shell. Now we’re ready to run our command.

Running the command

We cd into the directory if necessary and then run the management command.

Summary

  • docker ps to list running containers and locate the one
  • docker exec -it [container_name] bash to login to the bash shell on that container
  • cd to the django project and run python manage.py [command]

Django REST Framework: Remember to disable Web Browsable API in Production

So this is what happened – I built an url shortening service at work for internal use. It’s a very basic app – shortens urls and tracks clicks. Two models – URL and URLVisit. URL model contains the full url, slug for the short url, created time etc. URLVisit has information related to the click, like user IP, browser data, click time etc and a ForeignKey to URL as expected.

Two different apps were using this service, one from me, another from a different team. I kept the Web Browsable API so the developers from other teams can try it out easily and they were very happy about it. The only job of this app was url shortening so I didn’t bother building a different home page. When people requested the / page on the domain, I would redirect them directly to /api/.

Things were going really great initially. There was not very heavy load on the service. Roughly 50-100 requests per second. I would call that minimal load. The server also had decent hardware and was running on an EC2 instance from AWS. nginx was on the front while the app was run with uwsgi. Everything was so smooth until it happened. After a month and half, we started noticing very poor performance of the server. Sometimes it was taking up to 40 seconds to respond. I started investigating.

It took me some time to find out what actually happened. By the time it happened, we have shortened more than a million urls. So when someone was visiting /api/url-visit/ – the web browsable api was trying to render the html form. The form allows the user to choose one of the entries from the URL model inside a select (dropdown). Rendering that page was causing usages of 100% cpu and blocking / slowing down other requests. It’s not really DRF’s fault. If I tried to load a million of entries into a select like that, it would crash the app too.

Even worse – remember I added a redirect from the home page, directly to the /api/ url? Search engines (bots) started crawling the urls. As a result the app became extremely slow and often unavailable to nginx. I initially thought, I could stop the search engine crawls by adding some robots.txt or simply by adding authentication to the API. But developers from other teams would still time to time visit the API to try out things and then make the app non responsive. So I did what I had to – I disabled the web browsable API and added a separate documentation demonstrating the use of the API with curl, PHP and Python.

I added the following snippet in my production settings file to only enable JSONRenderer for the API:

Things have become pretty smooth afterwards. I can still enjoy the nice HTML interface locally where there are much fewer items. While on my production servers, there is no web browsable APIs to cause any bottlenecks.

Composition over Inheritance

Inheritance

If you know basic OOP, you know what Inheritance is. When one class extends another, the child class inherits the parent class and thus the child class has access to all the variables and methods on the parent class.

Here the MallardDuck extends Duck and inherits the speed class variable along with the fly method. We override the speedin the child class to suit our needs. When we call fly on the mallard duck, it uses the fly method inherited from the parent. If we run the above code, we will see the following output:

This is inheritance in a nutshell.

Composition

Let’s first see an example:

Here we’re not implementing the email sending functionality directly inside the EmailClient. Rather, we’re storing a type of email provider in the email_provider variable and delegating the responsibility of sending the email to this provider. When we have to send_email, we call the send method on the email_provider. Thus we’re composing the functionality of the EmailClient by sticking composable objects together. We can also swap out the email provider any time we want, by passing it a new provider to the set_provider method.

Composition over Inheritance

Let’s implement the above EmailClient using inheritance.

Here, we created a base class EmailClient which has the setup method. Then we extended the class to create GmailClient and YahooMailClient. Things got interesting when we wanted to start sending emails using Yahoo instead of Gmail. We had to create a new instance of YahooMailClient for that purpose. The initially created client was no longer useful for us since it only knows how to send emails through Gmail.

This is why composition is often favoured over inheritance. By delegating the responsibility to the different composable parts, we form lose coupling. We can swap out those components easily when needed. We can also inject them as dependencies using dependency injection. But with inheritance, things get tightly coupled and not easily swappable.

Python: pyenv, pyvenv, virtualenv – What’s the difference?

So I see questions around these terms very often in our growing Python Bangladesh community. Most of the times beginners are confused about what is what. I hope I can refer to this blog post to explain the similarities and differences.

pyenv

Have you ever wanted to test your code against multiple versions of Python? Or just wanted to install a newer version of Python without affecting your existing version? May be you heard about PyPy a lot and want to install it on your machine?

If you did, then pyenv is the perfect tool for you. It allows you to easily install multiple copies and multiple flavors of the Python interpreter. So you can not only install different versions of CPython, you can also install PyPy, Jython, Stackless Python and their different versions.

The tool provides a nice command line tool to easily swap out the global python interpreter. It also allows to define per application python version. You can use it’s local command or directly mention a python version in a file named .python-version under a directory and for that directory and it’s children, the mentioned version will be used.

Trust me, this project is awesome. I use it to switch between Python 2 and 3 on my local machine. I also use it often on servers to quickly install any flavor/version of Python. Do check out their docs, you will love it.

pyvenv & virtualenv

pyvenv and virtualenv allow you to create virtual environments so we can isolate our project dependencies. Why are they helpful? Say for example, you have one project which uses Django 1.6 still while your newer projects start with 1.9. When you install one version of Django, it replaces the other one, right? Virtual environments can rescue us from such situation. From the official docs:

A virtual environment (also called a venv) is a Python environment such that the Python interpreter, libraries and scripts installed into it are isolated from those installed in other virtual environments, and (by default) any libraries installed in a “system” Python, i.e. one which is installed as part of your operating system.

When we create a new virtual environment, it creates an isolated environment with it’s own local interepreter linked to it’s own libraries/scripts paths. So when we use this local interpreter, it loads the libraries from the local environment. If it can’t find one locally, then tries to locate that library in the parent/system environment.

Please note, these tools do not compile/install new Python interpreters. They simply create “virtual environments” on top of an installed Python version. Say, I have Python 3.5 installed on my machine and created virtual environments for this version. Then these environments would also have local copies of Python 3.5, except their environment paths would point to different locations. It’s like we’re copying the main interpreter to a new location and then making it use a different path to load libraries and packages.

virtualenv is often the most popular choice for creating the virtual environments. It has been around for a long period of time, it supports Python versions from 2.6 up to the latest 3.5. But it’s not something built into the standard Python distribution. You have to install it from the PyPi.

pyvenv comes with Python standard distribution from version 3.4. There is also a venv module in the standard library which allows us to access this functionality programmatically. We can find more details here: https://docs.python.org/3/library/venv.html.

Summary

pyenv – A Python version manager. Installs different versions and flavors of Python interpreters.

pyvenv – A tool to create isolated virtual environments from a Python interpreter. Ships with Python from 3.4.

virtualenv – Creates virtual environments, available in PyPi.

So pyvenv is comparable to virtualenv while pyenv is a totally different kind of tool.

Understanding Decorators in Python

Many beginners seem to take the concept of decorators as a fairly advanced and complex topic. It’s advanced alright but it probably is much simpler than you think.

Decorators Explained

Let’s say, we have a function which returns a message. But we want to also return the time with the message. So what can we do? We can modify the function’s source code to add the time with the message. But what if we can’t or don’t want to modify the source code but still want to extend/transform the functionality?

In that case, we can wrap it within another function, something like this:

Here, greet was our original function, which only returns a message but no time with it. So we be clever and write a wrapper – time_wrapper. This wrapper function takes a function as it’s argument and returns the new_function instead. This new function, when invoked, can access the original function we passed, get the message out and then add the time to it.

The interesting bit is here – greet = time_wrapper(greet). We’re passing greet to time_wrapper. The time_wrapper function returns the new_function. So greet now points to the new_function. When we call greet, we actually call that function.

By definition, a Decorator is a callables which takes a callable and returns a callable. A callable can be a few things but let’s not worry about that right now. In most cases, a decorator just takes a function, wraps it and returns the wrapped function. The wrapped function can access a reference to our original function and call it as necessary. In our case time_wrapper is the decorator function which takes the greet function and returns the new_function.

The @ decorator syntax

But you might be wondering – “I see a lot of @ symbols while reading on decorators, how can there be a decorator without the @?”. Well, before PEP 0318, we used to write decorators like that. But soon the wise people of the Python community realized that it would be a good idea to have a nicer syntax for decorators. So we got the @. So how does the @ work?

So when we add a callable name prepended with a @ on top of a function, that function is passed to that callable. The return value from that callable becomes the new value of that function.

Writing our own decorators

Let’s say we want to write a decorator which will take a function and print the current time every time the function is executed. Let’s call our function timed. This function will accept a parameter fn which is the function we wrap. Since we need to return a function from the timed function, we need to define that function too.

In this example, the timed function takes the fn function and returns the wrapped function. So by definition it’s a decorator. Within the wrapped function, we’re first printing out the current time. And then we’re invoking the fn() function. After the decorator is applied, this wrapped function becomes the new fn. So when we call fn, we’re actually calling wrapped.

Let’s see example of this decorator:

With the @timed decorator applied to hello, this happens: hello = timed(hello), hello now points to the wrapped function returned by timed. Inside the for loop, every time we call, hello, it’s no longer the original hello function but the wrapped function. The wrapped function calls the copy of the original hello from it’s parent scope.

Two things you might have noticed – it is possible to nest functions and when we nest a function within a function, the inner function can access the parent scope too. You can learn more about the scope by reading on closure.

Decorator Parameters

Decorators can take parameters too. Like this:

When a decorator takes a parameter, it’s executed like:

As we can see, it gets a level deeper. Here sleeper has to take the parameter and return the actual decorator function which will transform our say_hello function.

In this case, sleeper(4) returns the decorator function. We pass say_hello to the decorator. The decorator wraps it inside the wrapped function and returns wrapped. So finally, say_hello is actually the wrapped function which gets fn and secs from the closure.

Chaining Decorators

We can chain multiple decorators. Like this:

The bottom most one gets executed first, then the returned function is passed to the decorator on top of that one. This way the chain of execution goes from bottom to top.

Using Classes as Decorators

In our previous examples, we have only focused on functions, but in Python, any callables can be used as decorator. That means we can uses Classes too. Let’s first see an example:

When we’re using the Sleeper decorator, we are getting the parameter 5 to the constructor. We are storing it in an instance variable. The constructor returns an object instance, when we call it, it gets the function and returns a decorated, wrapped function.

This is just like before, say_hello = Sleeper(5)(say_hello). The first call is the constructor. The second call is made to the __call__ magic method.

Decorating Class and Class Methods

We can decorate any callables, so here’s an example where we’re decorating a Class to forcefully convert the age argument to int.

We can decorate the methods as well. If you know Python’s OOP model well, you probably have already came across the @property decorator. Or the @classmethod and @staticmethod decorators. These decorate methods.

Python: A quick introduction to the concurrent.futures module

The concurrent.futures module is part of the standard library which provides a high level API for launching async tasks. We will discuss and go through code samples for the common usages of this module.

Executors

This module features the Executor class which is an abstract class and it can not be used directly. However it has two very useful concrete subclasses – ThreadPoolExecutor and ProcessPoolExecutor. As their names suggest, one uses multi threading and the other one uses multi-processing. In both case, we get a pool of threads or processes and we can submit tasks to this pool. The pool would assign tasks to the available resources (threads or pools) and schedule them to run.

ThreadPoolExecutor

Let’s first see some codes:

I hope the code is pretty self explanatory. We first construct a ThreadPoolExecutor with the number of threads we want in the pool. By default the number is 5 but we chose to use 3 just because we can ;-). Then we submitted a task to the thread pool executor which waits 5 seconds before returning the message it gets as it’s first argument. When we submit() a task, we get back a Future. As we can see in the docs, the Future object has a method – done() which tells us if the future has resolved, that is a value has been set for that particular future object. When a task finishes (returns a value or is interrupted by an exception), the thread pool executor sets the value to the future object.

In our example, the task doesn’t complete until 5 seconds, so the first call to done() will return False. We take a really short nap for 5 secs and then it’s done. We can get the result of the future by calling the result() method on it.

A good understanding of the Future object and knowing it’s methods would be really beneficial for understanding and doing async programming in Python. So I highly recommend taking the time to read through the docs.

ProcessPoolExecutor

The process pool executor has a very similar API. So let’s modify our previous example and use ProcessPool instead:

It works perfectly! But of course, we would want to use the ProcessPoolExecutor for CPU intensive tasks. The ThreadPoolExecutor is better suited for network operations or I/O.

While the API is similar, we must remember that the ProcessPoolExecutor uses the multiprocessing module and is not affected by the Global Interpreter Lock. However, we can not use any objects that is not picklable. So we need to carefully choose what we use/return inside the callable passed to process pool executor.

Executor.map()

Both executors have a common method – map(). Like the built in function, the map method allows multiple calls to a provided function, passing each of the items in an iterable to that function. Except, in this case, the functions are called concurrently. For multiprocessing, this iterable is broken into chunks and each of these chunks is passed to the function in separate processes. We can control the chunk size by passing a third parameter, chunk_size. By default the chunk size is 1.

Here’s the ThreadPoolExample from the official docs:

And the ProcessPoolExecutor example:

as_completed() & wait()

The concurrent.futures module has two functions for dealing with the futures returned by the executors. One is as_completed() and the other one is wait().

The as_completed() function takes an iterable of Future objects and starts yielding values as soon as the futures start resolving. The main difference between the aforementioned map method with as_completed is that map returns the results in the order in which we pass the iterables. That is the first result from the map method is the result for the first item. On the other hand, the first result from the as_completed function is from whichever future completed first.

Let’s see an example:

The wait() function would return a named tuple which contains two set – one set contains the futures which completed (either got result or exception) and the other set containing the ones which didn’t complete.

We can see an example here:

We can control the behavior of the wait function by defining when it should return. We can pass one of these values to the return_when param of the function: FIRST_COMPLETED, FIRST_EXCEPTION and ALL_COMPLETED. By default, it’s set to ALL_COMPLETED, so the wait function returns only when all futures complete. But using that parameter, we can choose to return when the first future completes or first exception encounters.

Parsing Upwork Job Feed to Monitor Clojure Jobs

I was checking Upwork to asses the job market for Clojure and it hit me – I can parse the Upwork Job Feed for Clojure and monitor it programmatically. So I fired up the REPL and started coding.

Before I began, I had to choose a Clojure library to parse RSS feeds. I went for https://github.com/scsibug/feedparser-clj. So I added this dependency ([org.clojars.scsibug/feedparser-clj "0.4.0"]) to my project.clj:

Now we can start writing some codes. First, we would fetch the content of the RSS feed and parse it. The parse-feed function from the above mentioned library would do that for us.

Next, we need a function to extract the data we need. We will run this function (map) over the collection of items.

Here we’re simply getting the values of :title key and :uri key and putting them in another hashmap. We’re naming our key :url instead of their :uri

We can grab the collection of items in the :entries key of the feed variable we declared before. So here’s our main function:

We’re mapping the function we wrote over the entries and getting a collection of hashmaps. Then we’re using doseq to iterate over them and print the data out.

The final code looks like this:

Here, we have extracted only two fields and printed them out. We extracted the data into a new hashmap as an example. As a matter of fact, we could just print them out from the original feed variable. Then the code would have been shorter:

Web Scraping with Clojure

I have recently started learning Clojure and I must say I am totally hooked. Clojure is a sane Lisp on the JVM. So I can express myself better while being able to take advantage of the huge JVM eco system. This is just wonderful!

After finishing the popular book Clojure for the Brave and True, I wanted to try something out myself. I decided to try web scraping with clojure. In this post, I am going to walk you through a very simple web scraping task.

We’re going to scrape KAT (Kick Ass Torrents) for TV series. To keep things simple, we would scrape the first page of the TV section and print out the titles. The reason I like KAT is they serve the response gzipped – if your http client can’t handle their response, you probably want to switch.

We will use the following libraries for the task:

  • http-kit
  • Enlive

Most tutorials for Clojure would use Java’s built in URL class with Enlive’s html-resource function but in our case it would not work, because it can’t handle compressed responses well. So we will use http-kit instead.

To begin with, we would add these libraries to our project.clj file (assuming we’re using Leiningen).

Now we’re ready to start writing the codes. Let’s first :require our libraries.

First we have to fetch the HTML response. We can use http-kit’s get function to grab the HTML. This function would return a promise. So we would have to dereference it using deref or the shorthand syntax @. When the promise is resolved, we would get a hashmap which would have a :body key along with :status and few other keys related to the request. We can pass this HTML response to Enlive’s html-snippet function to get an iterable DOM like object from which we can select the elements using select function.

We are using the {:insecure? true} part to ignore issues with SSL. So far, we have a function get-dom which would give us a DOM like object on which we can do select. We will now write another function which will extract the titles from this DOM like object.

Each Torrent title (which is a link, aka anchor tag) has the CSS class cellMainLink so we can select a.cellMainLink to get all the title links. Each title link would have their text part in the :content key. Each text part in the :content key is a vector. So we would need to use first on it to grab the actual text. Here’s what I wrote:

I simply could not resist using comp to do some magic here. comp allows us to combine two functions to compose one which allowed us to first grab the content and then get the first element in our case.

Finally, we can run our functions like this:

Here’s the complete file:

The code is under 20 lines! 😀