Gearman, PHP and Supervisor: Processing background jobs with sanity!

On a Leevio project, I’ve been entrusted with the responsibility to build an engine that processes huge data with heavy network IO. The engine needs to integrate seamlessly with the web front end. Since we had many of the business logic already done in PHP, I preferred to go with PHP. It’s not like that we didn’t have an option. I could go for Python, but then I would have had to port my existing ZF models to Python or build some APIs to access data through another layer of abstraction. I wanted to keep it simple and short. And I had a feeling, if the pros (Yahoo!, Digg, MailChimp, GrooveShark, Xing and who not?) can do it with PHP, why can’t we?

I built the core engine on Gearman. It was simple, stupid and very easy to get things done. My workers were running fine in the test phase. But it didn’t turn out that easy when the workers grew complex. PHP was not built for long running requests. It’s internal caches and circular object references were draining memory like hell. With the different performance optimizations I have made, it has improved over the period of time but still it can sometimes leak memory like anything. Besides, the awesome garbage management was introduced only in PHP 5.3 and we didn’t dare upgrade (because some of the legacy technologies used in the project). The engine was working just fine but I was having a hard time keeping it alive. Sometimes the processes exhausted into Zombie Process.

I’ve tried register_shutdown_function to quit and and restart at the completion of every job processing. It worked well but was still not enough – the zombie processes were not being restarted. 🙁 I didn’t want to impose memory limit either – that might have terminated the process in the middle of a job (and the hell will break lose!) 🙁

I then started googling for any mechanism to manage PHP workers with Gearman. I found “Supervisor”. Grooveshark and many others use it with PHP and Gearman. It’s a very simple tool to handle your workers (not just Gearman workers, literally any command). It actually starts the processes as it’s subprocess and can identify the death of processes running under it. The most important part is – it can auto restart the processes based on custom configurations (depending on exit codes). The configuration is simple. You define a command, configure it and then run supervisor. With supervisor, I’m running the PHP processes in a similar fashion to the web page handling. Every time the work is complete, the process quits and supervisor restarts it. The process goes in an infinite loop, waits for new workload. When the workload arrives, it works and then again quits – this repeats.

The engine is now more stable and working fine. So, if you have to manage long running workers with PHP and Gearman, do consider Supervisor to manage your workers. It shall pay off!

Here’s a quick How To on getting started with Supervisor:

Installation
Supervisor is a Python daemon and available from pypi. So you can “easy_install” it. First, we install Python setuptools (to enable easy_install):

Now we install supervisor:

The I created “/etc/supervisord.conf” with the sample config by typing:

Then I removed the unnecessary parts and kept what I need. Check out the Configuration Manual for better understanding and full coverage on the available options.

After saving the configuration, run Supervisor in debug mode to check if the configuration is okay. Debug mode doesn’t send supervisord background and prints errors on the console. Use it like this:

When debugging is done, simply run:

It’ll run supervisor, spawn the sub processes, detach from terminal and get into daemon mode.

What’s Next?

Well, this does my job perfectly. But if you want to do some more tricks or advanced stuff, you should look at the Process Controll Extensions of PHP to take the advantages of native Unix environment!

Happy supervising!


6 Comments Gearman, PHP and Supervisor: Processing background jobs with sanity!

  1. Hasin Hayder

    Masnun, this post is really good. i was hoping to start with supervisor and was looking for a good article. But there’s nothing better than a step by step article with all the detailed information like this one you wrote. Very good one. Keep up the good work.

    You just helped me to learn supervisor in a painless way 🙂

    Reply
  2. Pingback: Symfony2 and RabbitMQ: Lessons learned | Ricbra's Blog

  3. Pingback: PHP activerecord mysql server has gone away | DL-UAT

Leave a Reply

Your email address will not be published. Required fields are marked *