On a Leevio project, I’ve been entrusted with the responsibility to build an engine that processes huge data with heavy network IO. The engine needs to integrate seamlessly with the web front end. Since we had many of the business logic already done in PHP, I preferred to go with PHP. It’s not like that we didn’t have an option. I could go for Python, but then I would have had to port my existing ZF models to Python or build some APIs to access data through another layer of abstraction. I wanted to keep it simple and short. And I had a feeling, if the pros (Yahoo!, Digg, MailChimp, GrooveShark, Xing and who not?) can do it with PHP, why can’t we?
I built the core engine on Gearman. It was simple, stupid and very easy to get things done. My workers were running fine in the test phase. But it didn’t turn out that easy when the workers grew complex. PHP was not built for long running requests. It’s internal caches and circular object references were draining memory like hell. With the different performance optimizations I have made, it has improved over the period of time but still it can sometimes leak memory like anything. Besides, the awesome garbage management was introduced only in PHP 5.3 and we didn’t dare upgrade (because some of the legacy technologies used in the project). The engine was working just fine but I was having a hard time keeping it alive. Sometimes the processes exhausted into Zombie Process.
I’ve tried register_shutdown_function to quit and and restart at the completion of every job processing. It worked well but was still not enough – the zombie processes were not being restarted. 🙁 I didn’t want to impose memory limit either – that might have terminated the process in the middle of a job (and the hell will break lose!) 🙁
I then started googling for any mechanism to manage PHP workers with Gearman. I found “Supervisor”. Grooveshark and many others use it with PHP and Gearman. It’s a very simple tool to handle your workers (not just Gearman workers, literally any command). It actually starts the processes as it’s subprocess and can identify the death of processes running under it. The most important part is – it can auto restart the processes based on custom configurations (depending on exit codes). The configuration is simple. You define a command, configure it and then run supervisor. With supervisor, I’m running the PHP processes in a similar fashion to the web page handling. Every time the work is complete, the process quits and supervisor restarts it. The process goes in an infinite loop, waits for new workload. When the workload arrives, it works and then again quits – this repeats.
The engine is now more stable and working fine. So, if you have to manage long running workers with PHP and Gearman, do consider Supervisor to manage your workers. It shall pay off!
Here’s a quick How To on getting started with Supervisor:
Installation
Supervisor is a Python daemon and available from pypi. So you can “easy_install” it. First, we install Python setuptools (to enable easy_install):
|
apt-get install python-setuptools |
Now we install supervisor:
The I created “/etc/supervisord.conf” with the sample config by typing:
|
echo_supervisord_conf > /etc/supervisord.conf |
Then I removed the unnecessary parts and kept what I need. Check out the Configuration Manual for better understanding and full coverage on the available options.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
|
[unix_http_server] file=/tmp/supervisor.sock ; (the path to the socket file) [supervisord] logfile=/tmp/supervisord.log ; (main log file;default $CWD/supervisord.log) logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB) logfile_backups=10 ; (num of main logfile rotation backups;default 10) loglevel=info ; (log level;default info; others: debug,warn,trace) pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid) nodaemon=false ; (start in foreground if true;default false) minfds=1024 ; (min. avail startup file descriptors;default 1024) minprocs=200 ; (min. avail process descriptors;default 200) ; the below section must remain in the config file for RPC ; (supervisorctl/web interface) to work, additional interfaces may be ; added by defining them in separate rpcinterface: sections [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface [supervisorctl] serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket [program:worker] command= /usr/bin/php worker.php process_name= worker numprocs=1 directory=/my/worker/directory autostart=true autorestart=true user=root stdout_logfile=/my/worker/directory/worker_stdout.log stdout_logfile_maxbytes=1MB stderr_logfile=/my/worker/directory/worker_stderr.log stderr_logfile_maxbytes=1MB |
After saving the configuration, run Supervisor in debug mode to check if the configuration is okay. Debug mode doesn’t send supervisord background and prints errors on the console. Use it like this:
When debugging is done, simply run:
It’ll run supervisor, spawn the sub processes, detach from terminal and get into daemon mode.
What’s Next?
Well, this does my job perfectly. But if you want to do some more tricks or advanced stuff, you should look at the Process Controll Extensions of PHP to take the advantages of native Unix environment!
Happy supervising!