The PageRank Algorithm

I think one of the best recent examples of the importance of mathematics is the rise of the search engine Google. I remember the world of search engines before Google and it was dominated by names like AltaVista, Yahoo, WebCrawler, Excite, and the likes. The standard way these search engines ranked the order that pages would be listed on a search query was basically to count the number of times that query appeared on pages in their database. The pages with the most listings were considered the most important, the second most listings were second most important and so on and so forth.

This sounds like a feasible way of doing things but let me show you an example of how this can be tricked. Suppose I wrote my first web page and it looked like the following:

That’s a basic web page that may not garner much attention, and it wouldn’t rank highly in most search engines as no work appears more than once. Suppose that, this being a math web page, I wanted it to rank higher on the query “math”. Then I could just edit the source code of the page to be as follows:

This second page says not much more than the first, but the fact that the word math appears 9 additional times would increase the ranking of this page among math pages. This is a very simple example, but it shows how these search engine rankings did not have a useful metric for determining the important sites on the web.

Enter Google.

The way Google solved this problem of determining the importance of a web page is basically by counting the number of links into a web page – the theory being that the more important a web page is, the more people will be talking about it and thus linking to it. Also, the more important the people talking about (linking to) a web site, the more important that site is. This can be expressed mathematically by the following formula:

In the above formula, the variable d is called the damping factor, which helps to capture some of the random nature of the internet by saying that every site should have at least some minimal worth because of the idea that a random surfer could still get to these sites.

I have written a script to implement the algorithm here.

Other Blogs that have covered this topic
Blue Onion

One thought on “The PageRank Algorithm”

  1. Hello there, simply become aware of your weblog via Google, and located that it’s truly informative. I’m going to be careful for brussels. I will appreciate if you happen to continue this in future. A lot of other folks might be benefited from your writing. Cheers!

Leave a Reply

Your email address will not be published. Required fields are marked *