Yes, Python is great! It’s beautiful and so on…. I have described the power of Python many times. For now, just the codes
Here’s a proxy scraper I built a few moments ago. It scrapes the web page at proxy-hunter.blogspot.com and lists the available open proxies.
#!/usr/bin/env python from BeautifulSoup import BeautifulSoup as Soup import re, urllib url = 'http://proxy-hunter.blogspot.com/2010/03/18-03-10-speed-l1-hunter-proxies-310.html' document = urllib.urlopen(url) tree = Soup(document.read()) regex = re.compile(r'^(\d{3}).(\d{1,3}).(\d{1,3}).(\d{1,3}):(\d{2,4})') proxylist = tree.findAll(attrs = {"class":"Apple-style-span", "style": "color: black;"}, text = regex) data = proxylist[0] for x in data.split('\n'): print x
It uses the BeautifulSoup package for parsing HTML. On ubuntu install it with this command:
sudo apt-get install python-beautifulsoup
On other platforms, grab the package from its homepage. Google is there to find the URL for you