Author: masnun

Using SVN with GoogleCode Project Hosting

Post author By masnun
Post date September 13, 2009
3 Comments on Using SVN with GoogleCode Project Hosting

At Leevio, we are going to use SVN or Subversion a lot. For demonstration and playing with the SVN system I chose Google Code Project hosting since it’s free and feature packed. I am a beginner at operating SVN and I had to get a sandbox SVN server to practise what I learn.

I already had the http://masnun.googlecode.com ready for this. So, I started trying out SVN on this account feeling relaxed that no matter what changes I make, no serious harm will be done.

SVN is pretty easy in fact. I read the “SVN Book” and “The Visual Guide to SVN”. None of them helped me grab SVN like the shell command — “svn help”. It was self explanatory and had a list of all available commands.

To start, I had to create a local repository or better said a local mirror of the project. I did the following:

svn checkout https://masnun.googlecode.com/svn/trunk/ masnun --username masnun

1	svn checkout https://masnun.googlecode.com/svn/trunk/ masnun --username masnun

It created a directory named “masnun” inside my linux home directory after I authenticated with my googlecode password.

Now I was ready to add files. So I copied a file named “quotes” into the /home/masnun/masnun (the local mirror) directory and typed in:

svn add quotes

1	svn add quotes

I got a text editing environment to write the changelog. After typing in a message, I pressed Ctrl+O to write out and then Ctrl + X to quit the editor.

Now the changes were reflected in the local copies, I had to sync the server. So I used:

svn commit

1	svn commit

And it transmitted all the data to the server 🙂

Now, I made some changes and updated the file using the following command:

svn update quotes

1	svn update quotes

Again, wrote the changelog and then :

svn commit

1	svn commit

I liked the simplicity and hated that I spent up much time reading theories telling me why SVN replaced CVS 🙁

Well, now I decided to delete the file:

svn del quotes
svn commit

1 2	svn del quotes svn commit

That was it… Easy !

Python

HTML and XML Parsing in Python using BeautifulSoup :)

Post author By masnun
Post date September 13, 2009
1 Comment on HTML and XML Parsing in Python using BeautifulSoup :)

I have been looking for a good library in Python for handling HTML and XML. I knew about BeautifulSoup but never cared about it much. But this time, when I was looking for a way to scrape web sites and harvest links using Python, I came across a nice tutorial that demonstrated the wonderful use of the “BeautifulSoup” module. I was amazed and decided to try it all by myself. It’s just amazing ! In fact it’s a shame if you have worked with Python but haven’t used this module. I later came to know that BeautifulSoup is very well known in the Python World.

How to use it?

First Download it and extract the archive. Then install the package. To install it, use the following command:

sudo python setup.py install

1	sudo python setup.py install

Once you install it, you can try if everything is okay by typing this code into the interactive python shell:

>>> from BeautifulSoup import BeautifulSoup

1	>>> from BeautifulSoup import BeautifulSoup

If you don’t get any error, then everything went fine and now we can start using the module.

The BeautifulSoup module has two prominent object definitions — BeautifulSoup and BeautifulStoneSoup. We use the first one for HTML parsing and the second one for XML parsing.

So, what are we waiting for ? Let’s dive deeper…

I have a webserver running and the URL at http://localhost/ holds the PHP Info page ( the output of phpinfo(); function of php ). We will play with that beautiful page in this session.

Lets fetch the HTML of that page first. We will use urllib.urlopen() to fetch the page and then the returned object to construct a beutiful soup object.

>>> import urllib
>>> file = urllib.urlopen("http://localhost/")
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(file)

>>> import urllib

>>> file = urllib.urlopen("http://localhost/")

>>> from BeautifulSoup import BeautifulSoup

>>> soup = BeautifulSoup(file)

That’s it — we now have a soup object that we can use to browse the HTML document.

There are a lot of features of BeautifulSoup. But I am going to demonstrate how to find a special tag and extract the data inside.

From looking at the HTML source of the page, I know that the <td> that has class=”e” contains data about php settings. To find all tags that has certain attribute with a fixed value, we use the : findAll() method in this way:

>>> list = soup.findAll('td',attrs={"class":"e"})
>>> len(list)
382
>>>

>>> list = soup.findAll('td',attrs={"class":"e"})

>>> len(list)

382

>>>

That is, we pass the tag name and a dictionary of it’s attributes and their values to the findAll() method to get a list back with the results. Please remember that, the dictionary we pass to the findAll() method with the attributes should be named “attrs” otherwise it’d not work. It’s because that’s an optional parameter or so called **kwargs that is keyword arguments as key-value pairs. And to define them, we always need to declare the parameter name explicitly in the function definition.

So, after that we store the list as “list”. The len() function is a built-in function that we used to count the number of elements. Yes, there are 382 tags those match our query.

You can explicitly extract any single tag by typing that in the following method:

>>> soup.h2
<h2>PHP Core</h2>
>>>

>>> soup.h2

>>>

>>> soup.title
<title>phpinfo()</title>
>>>

>>> soup.title

<title>phpinfo()</title>

>>>

You can get the string inside a tag using the “string” attribute in this way:

>>> soup.title.string
u'phpinfo()'
>>>

>>> soup.title.string

u'phpinfo()'

>>>

Yeah, BeautifulSoup converts strings into Unicode by default. You can override this behaviour. But I am not going to cover that.

Remember, if one tag is nested in another, the parent tag might not return a string if you use the above method.

For more details about this super cool module, please read their documentation. It’s very user friendly, easy-to-understand and of course extremely informative.

I really love BeautifulSoup and Python ! 🙂

PHP

PHP Multi Threading :)

Though not available under apache for processing web pages, we do have multi-threading in PHP 🙂

The “pcntl” extension enables multi threading in PHP command line interpreter. That is you can work with php multi threading only from command line. Here’s a code snippet:

<?php
// php pcntl_fork demonstration

$fork = pcntl_fork(); // fork a new process
echo "\n\n";
switch($fork) {

case 0: // It's a child
echo "I am a child process of ".posix_getppid()." and my process ID is ".posix_getpid()." \n\n";
break;

case -1: // forking failed
echo "Something went wrong while \"fork()\"ing a new process :( \n\n";
break;

default: // It's the parent
pcntl_wait($status); // store the child's exit status
echo "I am the parent process with ID: ".posix_getpid()." and my child's ID is: $fork and it's exit status is: $status \n\n";
break;



}




?>

<?php

// php pcntl_fork demonstration

$fork = pcntl_fork(); // fork a new process

echo "\n\n";

switch($fork) {

case 0: // It's a child

echo "I am a child process of ".posix_getppid()." and my process ID is ".posix_getpid()." \n\n";

break;

case -1: // forking failed

echo "Something went wrong while \"fork()\"ing a new process :( \n\n";

break;

default: // It's the parent

pcntl_wait($status); // store the child's exit status

echo "I am the parent process with ID: ".posix_getpid()." and my child's ID is: $fork and it's exit status is: $status \n\n";

break;

}

The above example first calls the pcntl_fork() function that creates another process with the same data. That is the current execution data is transferred into a new process. Both the processes will advance in the same way. We will have to differentiate the two processes from this point and assign two different tasks to them.

We differentiate the processes by using the return value of the pcntl_fork() function. If the function is successful in creating a new process, it will return two values — one for each of the processes. And a single value if failed. As you can imagine, if it fails, we will have that single process which started at the beginning. On success, we have two processes running at the same time. Both executes the same php script. But the return value of the pcntl_fork() function varies. So, we should add some code to the script that determines the process which is executing the script and act likewise.

The return value of the function could be of three types:
— A Process ID
— 0 (Zero)
— (-1) (Negative One)

If the process that’s executing the script is the child process, it gets the return value 0. And the parent process gets the process ID of the child. (-1) means the forking failed.

On the above example, we have used posix_getpid() and posix_getppid() functions to retrieve the process ID of that process and it’s parent’s.

The process of forking is a bit tough and it took an hour of total wilderness to understand how it really worked. And to be honest, the PHP manual is not that helpful regarding this extension if I compare to other PHP functionalities.

Recent Posts

Recent Comments

Archives

Categories