হিস্টোগ্রাম নাকি বার চার্ট?

ডাটা ভিজুয়ালাইজেশনের সময় আমরা প্রায়শই হিস্টোগ্রাম ও বার চার্ট এর ব্যবহার দেখবো । বার চার্ট এবং হিস্টোগ্রাম দেখতে অনেকটা একই রকম । দুটোতেই আমরা লম্বা লম্বা বার আকিঁ । x-axis এ ইন্ডিপেন্ডেন্ট ভ্যারিয়েবল ও y-axis এ ডিপেন্ডেন্ট ভ্যারিয়েবল এর মান বসিয়ে দিয়ে আমরা সুন্দর করে হিস্টোগ্রাম ও বার চার্ট একেঁ ফেলি ।

দেখতে একই রকম হলেও দুটোর মধ্যে রয়েছে মৌলিক পার্থক্য । হিস্টোগ্রাম টা আমরা রেন্জ বা ডিস্ট্রিবিউশানের ক্ষেত্রে ব্যবহার করি । যেমন নিচের হিস্টোগ্রামটা দেখি:

এখানে আমরা দেখছি বিভিন্ন বয়সের মানুষ, যারা ছুটি কাটাতে যায় তাদের মধ্যে কতজন হোটেলে থাকে । আমরা খেয়াল করলে দেখবো, এখানে ২০ বছরের নিচে যারা, তাদের মধ্যে ৫ জন হোটেলে থাকে, ২১-৩০ এর মধ্যে ১৫ জন, ৩১-৪০ এর মধ্যে আছে ১০ জন এবং ৪১-৫০ বয়সীদের মধ্যে ৫ জন এর কম ।

লক্ষ্য করি, এখানে আমাদের ডাটা নিউমেরিক্যাল এবং আমরা রেন্জ নিয়ে কাজ করছি – বয়সের রেন্জ । বিভিন্ন বয়সের মানুষকে নির্দিষ্ট কিছু রেন্জে আমরা ডিস্ট্রিবিউট করেছি । এখানে ডাটার এই রেন্জে কোন গ্যাপ নেই । এবং হিস্টোগ্রামের পার্টগুলোকে বা বার গুলোকে অন্য কোন অর্ডারে রি-এ্যারেন্জ করার সুযোগ নেই ।

এবার দেখি একটি বার চার্ট:

এই চার্টে আমরা দেখছি বিভিন্ন ব্র্যান্ডের গাড়ি নির্মাতাদের জানুয়ারী মাসে বিক্রিত গাড়ির সংখ্যা । এখানে প্রধান লক্ষনীয় বিষয় হলো এইবার আমাদের ডাটা কিন্তু নিউমেরিক্যাল না, বরং ক্যাটেগরিক্যাল । বিভিন্ন ব্র্যান্ডের মধ্যে তুলনা করতে আমরা এই বার চার্টটি ব্যবহার করতে পারি । বার চার্টের ক্ষেত্রে বার গুলোকে রি-এ্যারেন্জ করলেও কোন সমস্যা হয় না । এবং বার গুলো একটা আরেকটার সাথে কানেক্টেড না, মাঝ খানে তাই গ্যাপ থাকে ।


TL;DR – হিস্টোগ্রাম একটা ভ্যারিয়েবলের ডিস্ট্রিবিউশান রিপ্রেজেন্ট করার জন্য ব্যবহার করা হয়, বার চার্ট বিভিন্ন ভ্যারিয়েবলের মধ্যে তুলনা করার জন্য ব্যবহার করা হয় ।

Embedding IPython in your application

If you work with Python regularly, you probably know about IPython already. IPython has web based notebooks, QT based GUI consoles and plain old simple Terminal based REPL which is simply fantastic. But that’s not all, we can also embed IPython in our applications too. And this can lead to a number of potential use cases.

Use Cases

A common use case could be to drop into a IPython shell for quick interactive debugging. This can come very handy during prototyping.

Let’s see an example:

When we run this code, we will get a nice IPython REPL where we can try out things. In our case, we haven’t done much except defining a variable named name. We can print it out.

I use Iron.io workers/queues/caches at my day to day job. So I often need to check status of the workers or get the size of a queue or even queue a few workers. I also need to check a few records on Mongodb. An interactive prompt can be really helpful for these.

Now I can just do launch_workers("send_emails", 3) to launch 3 worker instances for the “send_emails” worker. Or get the number of buyers with more than 100 purhcases with the top_buyers() function.

Customizing The Prompt

When we embed IPython, it displays it’s common banner when starting.

We can easily disable that. To do so, we need to pass empty string to the banner1 parameter to the embed method.

Or we can further customize the 2nd banner or the exit message like this:

Deploying a NodeJS 5 app with nginx on Ubuntu

In this blog post, we would follow the steps to deploy a NodeJS app using the latest version of Node with nginx on Ubuntu.

Installing Node.js 5

The official repository on Ubuntu doesn’t ship the latest version of NodeJS yet. So we will use a third party source to install it from:

Using PM2

I love PM2 for keeping my node apps alive. If you didn’t know, PM2 is an awesome tool that launches Node processes and monitors them. If it crashes, it can restart them. PM2 is very easy to setup and use. It’s also quite feature packed.

We would setup PM2, launch our app with it and then generate a launch script so PM2 itself is started on system reboot.

Nginx Configuration

Now that the app is running, it’s time to setup nginx as our reverse proxy. Here’s the default configurations I use:

Live Debugging Webhooks with Ngrok

ngrok is an awesome service – it creates secure tunnels to localhost. With ngrok, you get a url like http://459387bb.ngrok.com which is actually tunnel to a port to your local machine. So any request you make to that url is served by the app that you run on that port.

I know there are many cool services to debug webhooks like Requestbin – but the main benefit of ngrok is the app keeps running on your app, serving live traffic. So you can debug it in real time.

In this blog post, we would use a Node.js server with ngrok to serve Mandrill webhook requests.

Installing ngrok

Downloading and installing ngrok is pretty easy as you can find here — https://ngrok.com/download. However, if you’re on OS X and use Homebrew, you can install it with just one command:

Creating a Node.js App

Here’s a sample Node app that listens on port 3000 and parses the mandrill payload using body-parser package.

Tunneling Traffic

Once we have the app running on port 3K, we can ask ngrok to create a tunnel for us. For this we just need to pass the port number to the ngrok command:

We would get an url soon afterwards. We can use this url to POST requests. In our case, go to your Mandrill account and create a webhook. Mandrill will send events to this url and it will be served by your app, running locally on your machine. You can make changes to the codes and restart anytime.

Awesome, no?

Homebrew & Pyenv: Installing PyQT5 with Python3 on OSX

There was a time back in 2014 and earlier when PyQT5 installation was not straightforward and needed manual compilation. When searching on Google, still those posts come up on top results. But nothing to worry about, things have changed – it’s now quite simple.

Installation

If you are not already using Homebrew, you should start using it. Once Homebrew is installed, let’s install PyQT5 with this single command:

Verifying Installation

Let’s take a sample PyQT5 code as example and run it. For examples, I usually pick one up from the excellent PyQT tutorials on zetcode.com. Here’s one:

Run it using:

If you get a nice looking small window – it worked!

Integrating with Pyenv

I am a big fan of pyenv and use it for running different versions and flavours of Python. If you use pyenv too, chances are you have your own version of Python installed through it. However, the brew formula that installs PyQT5 depends on another formula python3 – homebrew’s own Python 3 installation. When we install PyQT5, this formula is used to install the bindings, so the bindings are only available to this particular Python 3 installation and unavailable to our pyenv versions.

We will discuss two potential solutions to this issue.

Switching to system

One simple work around is to use the Python 3 version installed by Homebrew. We can ask pyenv to switch to the system version whenever we’re doing PyQT5 development.

We can create an alias to quickly switch between Python versions. I have this in my .zshrc:

This way is very quick and simple but we miss the benefits of using pyenv.

Adding Site Packages

Alternatively, we can add the site-packages for this homebrew installed python 3 to our pyenv installation of python 3. Since both installations were built on the same machine and OS, the bindings should work correctly. We would be using .pth files to do this.

Let’s first find out the site-packages for the homebrew installation:

We would notice a message like:

That is the site-packages for this version.

Now let’s find our pyenv python3’s local site directory:

Now create a homebrew.pth file in that directory and put the previously found site packages path there.

Let’s create the file:

And put these contents:

Save and exit. Now you should be able to just use:

Dockerizing a Django Application

I assume you are already familiar with Docker and it’s use cases. If you haven’t yet started using Docker, I strongly recommend you do soon.

I have a Django application that I want to dockerize it for local development. I am also new to Docker, so everything I do in this post might not be suitable for your production environment. So please do check Docker best practices for production apps. This tutorial is meant to be a basic introduction to Docker. In this post, I am going to use Docker Machine and Docker Compose. You can get them by installing the awesome Docker Toolbox.

Components Breakdown

Before we start, we need to break down our requirements so we can individually build the required components. For my particular application, we need these:

  1. Django App Server
  2. MySQL Database Server
  3. Redis Server

We will build images for these separately so we can create individual containers and link them together to compose our ultimate application. We shall build our Django App server and use pre-built images for MySQL and Redis.

Building the Django App Server

Before we begin, let’s talk Dockerfiles. Dockerfiles are scripts to customize our docker builds. It allows us control and flexibility over how we build the images for our applications. We will use our custom Dockerfile to build the Django app server.

To build an image for a Django application we need to go through these following steps:

  • Select a Linux image, we choose Ubuntu
  • Install required packages for the distro.
  • Install Python packages which are required for the app
  • Provide a default command to run and ports to expose

Here’s the Dockerfile we shall use:

So what are we doing here:

  • We’re choosing phusion/baseimage as our base image. It’s a barebone image based on Ubuntu. Ubuntu by default comes with many packages which we don’t need to run inside docker. This base image gets rid of those and provides a very lean and clean image to start with.
  • We just provide a Maintainer name
  • We set DEBIAN_FRONTEND to be non interactive. This will not display any interactive prompts during the build process. Since the docker build process is automated, we really don’t have any way to interact during it. So we disable interaction. And as you might have guessed already ENV sets an environment variable.
  • We install some packages we shall need.
  • We copy our requirements.txt file to /app/src/requirements.txt, change the work directory and install the packages using pip. ADD is used to copy any files or directories to the container while it builds. You might wonder why we didn’t copy over our entire project – that’s because we want to use docker for our development. We will use a nice featire of Docker which would allow us to mount our local directories directly inside the container. Doing this, we would not need to copy files every time they change. More on this will come later.
  • We change directory to /app/src/lisp and run the runall management command. This command runs the Django default server along with some other services my application needs. But usually we would want to just do runserver
  • We EXPOSE port 8000

If you go through the Dockerfile References you will notice – we can do a lot more with Dockerfiles.

Docker Compose and Linking Services

As we mentioned earlier, we shall use pre-built images for MySQL and Redis. We could build them ourselves too but why not take advantage of the well maintained images from the generous folks in the docker community?

We can link multiple docker containers to compose a final application. We can do that using the docker command manually. But Docker Compose is a very nice tool which allows us to define the services we need in a very easy to read syntax. With docker compose, we don’t need to run them manually, we can just use simple commands to do complex docker magic! Here’s our docker-compose.yml file:

In our docker-compose file, we define 3 components:

  • For the web, we pass the path to Dockerfile to build key. We ask to restart always and define volumes to mount. .:/app/src means – mount the current directory on my OS X as /app/src/ on the container. We also define which ports to expose and which containers should be linked with it
  • We also define the mysql and redis components with respective configurations. Note that we define the pre-built image name in the image key. Please make sure the volume paths exist and are accessible.

You can consult the Compose File Reference for more details.

Running The Services

To run the application, we can do:

Please note, the Django server might throw errors if the MySQL / Redis server takes time to initialize. So I usually run them separately:

Database Configuration for Django

Our MySQL server is running on the IP of the Docker Machine. You need to use this IP address in your Django settings file. To get the IP of a docker machine, type in:

Creating Initial Databases

We can pass a MYSQL_DATABASE environment value to the mysql image so the database is created when creating the service. Or we can also connect to the docker machine manually and create our databases.

Extracting links and their page title from your Twitter Archive

Twitter allows us to download our Tweets from the account settings page. Once we request our archive, Twitter will take some time to prepare it and send us an email once this is ready. We will get a download link in the email. After unpacking the archive, we shall find a csv file that contains our tweets – tweets.csv. The archive also contains a html page (index.html) that displays our tweets on a nice UI. While this is nice to look at, our primary objective is to extract the links from our tweets.

If we look at the CSV file closely, we shall find a field named expanded_urls which generally contains the urls we use in our tweets. We will work with the values in this field. With the url, we also want to fetch their title. For this we will use Python 3 (I am using 3.5) and we need the requests and beautifulsoup4 packages to download and parse the pages. Let’s install them:

We will follow these steps to extract links and their page titles from the tweets:

  • Open the csv file and read row by row
  • Each row contains a tweet, we take the expanded_urls field
  • This field can contain multiple urls, separated by a comma. We need to iterate over them all
  • We will skip some domains, for example, we don’t want to visit links to twitter status updates
  • We fetch the html content using the requests library. If the page doesn’t return a HTTP 200, we ignore the response
  • We extract the title using beautiful soup and display it

Now let’s convert these steps to codes. Here’s the final script I came up with:

I am actually using this for a personal project I am doing here – https://github.com/masnun/bookmarks – it’s basically a bare bone django admin app where I intend to store the links I visit/share. I come across a lot of interesting projects, articles, videos and then later lose track of them. Hope this app will remedy that. This piece of code is part of a twitter import functionality of the mentioned app.

Top 500 StackOverflow contributors from Bangladesh

Update: The result returned from the StackExchange is slightly outdated. So it might not display the latest reputation or other profile changes and thus slightly affecting the ranking.

Update: Because the large list was affecting the site performance, I have moved it to Github Gist.


This post uses the StackExchange Data Explorer to query the StackOverflow users and grab their data. The Python script to query and parse the data is attached below the ranking. So let’s wait no further and meet the top 500 people on SO from Bangladesh:

Script:

Python 3: Using blocking functions or codes with asyncio

We know we can do a lot of async stuff with asyncio but have you ever wondered how to execute blocking codes with it? It’s pretty simple actually, asyncio allows us to run blocking code using BaseEventLoop.run_in_executor method. It will run our functions in parallel and provide us with Future objects which we can await or yield from.

Let’s see an example with the popular requests library:

If you run the code snippet, you can see how the two responses are fetched asynchronously :-)

Creating a Twitter Retweet Bot in Python

We want to create a bot that will track specific topics and retweet them. We shall use the Twitter Streaming API to track topics. We will use the popular tweepy package to interact with Twitter.

Let’s first install Tweepy

We need to create a Twitter app and get the tokens. We can do that from : https://apps.twitter.com/.

Now let’s see the codes:

The code is pretty much self explanatory:

  • We create a Twitter API client using the oAuth details we got earlier
  • We subclass StreamListener to implement our own on_data method
  • We create an instance of this class, then create a new Stream by passing the auth handler and the listener
  • We use the track method to track a number of topics we are interested in
  • When we start to track the topics, it will pass the data to on_data method where we parse the tweet, check some common words to avoid, check language and then retweet it.