Categories
Python

Building A Proxy Scraper with 15 lines of Python

Yes, Python is great! It’s beautiful and so on…. I have described the power of Python many times. For now, just the codes 🙂 Here’s a proxy scraper I built a few moments ago. It scrapes the web page at proxy-hunter.blogspot.com and lists the available open proxies.

It uses the BeautifulSoup package for parsing HTML. On ubuntu install it with this command:

On other platforms, grab the package from its homepage. Google is there to find the URL for you 😉

Categories
Python

Using Python on Hostmonster, Umbrahosting And Other General Shared Hosting

I’ve a hostmonster and umbrahosting shared hosting where my sites are running smoothly in PHP. I’m perfectly happy with both UmbraHosting and HostMonster. Additionally both the hosting providers have Python support. Being a Python enthusiast, I decided to check out.

On both Hostmonster and Umbrahosting, you can execute Python scripts as CGI scripts. But you must take care of these points:

— They must be placed inside the cgi-bin directory.
— Permission should be 755. (777 triggered server error for me)

I didn’t find the latest version of Python anywhere on these two hosts. Let alone Python 2.5, they have Python 2.4. 🙁 But still I’m happy that I can run Python. The only thing that I didn’t like was the obligation to put my scripts in the cgi-bin directory. So, I couldn’t have URLs like https://masnun.com/hello.py 🙁

Later today afternoon, an idea clicke in my mind. Can’t I target a subdomain to the cgi-bin directory? The answer is: YES! Just use a .htaccess file to map certain urls to your cgi-bin directory.

Putting the .htaccess in your domain’s root directory will allow you to put a python file inside your cgi-bin directory and visit it from http://example.com/example.py 😀

Categories
Python

Hosting your Twitter Bots on Google App Engine

UPDATE: Download the improved version of the application. I have made some code changes after the blog post got posted to reddit. The source code is now more readable and the app easily configurable.


I am amazed to see the implementation of cron jobs on Google App Engine. I was hosting my twitter bots on my paid shared hosting. My scripts were written in php and they were run via cron jobs set from my cPanel. Of course the cPanel cron UI is now quite familiar to me, to us who use it very often. But as a matter of fact, I can’t run per minute cron on my host. I should have a minimum interval of 15 minutes between two execution of the same cron job. I was amazed when I first discovered that Google App Engine lets me run cron every minute. Since they calculate my usage in different types of quotas, they hardly care about how often you schedule your cron. So, I decided to port my twitter bots to GAE and Python.

The idea was pretty simple. I already had a Yahoo! Pipe setup which mashes up multiple RSS feeds into one. My app will be collecting data from the pipe in JSON format. Then it’ll check the last tweeted entry and tweet the newer entries. Pretty straightforward algorithm. I registered a new application for my bots and started developing it on my local machine.

I integrated the u.nu url shortening service with their API quite easily. Though Python 2.6.2 has a built in module for JSON handling, Python 2.5 doesn’t have it. So, I had to install simplejson for JSON decoding. Later, I moved on to finding a good twitter client for my app. I first tried python-twitter. I have previously used it and it worked fine. But the application was generating some error messages when I integrated into GAE. I was puzzled and tried to find if there was something wrong in my coding. After about an hour, I concluded that my code was okay! There’s something wrong with the client. A bit of googling revealed that “python-twitter” doesn’t work on GAE without any hacks :'(. I had to find an alternative and I chose tweepy for that. It was working fine with the app.

Finally after 3 hours, the app was ready and it’s working! GAE doesn’t permit long execution period for a process, so I had to decrease the amount of tweets per request. But I covered that up by setting a more frequent cron job :). I would try the new taskqueue API soon to see if that has any help for my app.

My twitter bots have proved handy in the past and it feels great that I can host them on a scalable and efficient platform like Google App Engine — without paying a single dollar per month! 😀

Download Source : http://masnun.googlecode.com/files/Twitter_BOT_GAE.zip 🙂