Clean Room Implementation of Google Page Rank Algorithm

By Angsuman Chakraborty, Gaea News Network
Thursday, August 17, 2006

Finally a clean-room implementation of Google’s Page Rank Algorithm in Java, reverse-engineered from their numerous commentary on Page Rank (or is it Pigeon Rank?).

public static int getPageRank(url) {
    // start off with a random low PR
    int pageRank = rand.getInt(0, 3);

    if ( isHostedOn('google.com', url) ) {
        pageRank++;
    } else if ( isHostedOn('microsoft.com', url) ) {
        pageRank--;
    }
    // Support valid pages
    if (isValidPage(url) ) {
        pageRank += 1;
    }

    tag_value['b'] = 1;
    tag_value['h2'] = 2;
    tag_value['h1'] = 3;
    tag_value['strong'] = -1; // W3C sux!
    pageRank = calculateTagsPR(tag_value, pagerank);

    // Sergey said good news sites have
    // lots of nested tables
    tablesOnPage = getTagCount('table');
    if (tablesOnPage >= 50) {
        pageRank += 2;
    }

    if (pageRank >= 5) {
        pageRank = 4; // helps selling AdWords
    }

    if (linksFrom('mattcutts.com', url) >= 4) {
        // I link to "clean" sites only
        // ? Matt, Feb 2006
        pagerank += 2;
    }

    pagerank += countBacklinks(url) / 10000;

    blacklist1 = getList('c:\chinese-government-censored.txt');
    blacklist2 = getList('c:\larry-page-hatelist.txt');
    if ( inArray(blacklist1, url) || inArray(blacklist2, url) ) {
        pageRank = 0;
    }

    d = dashesInUrl(url);
    pageRank = (d >= 3) ? pageRank -1 : pageRank + 1;

    if (inString(url, "how to build a bomb")) {
        // added on request. 2004-12-01.
        recipient = "peter@homelandsecurity.gov";
        subject = "You might wanna check this...";
        sendMailTo(recipient, subject, url);

        // page might still be relevant
        pageRank++;
    }

    if (month() == "June" || month() == "October") {
        // makes people talk about
        // PR updates, good publicity
        pagerank -= randomNumber(1,3);
    }    

    if (checkIdenticalPageAndLinkColor) {
        // spammer!! Googleaxe it!!
        pagerank = 0;
    }

    if (url == "https://www.nytimes.com") {
        // just testing, pls remove tomorrow
        // ? Frank, June 2003
        pagerank = 10;
    }

    //Don't show PR above 10
    if(pagerank > 10)  pagerank = 10;

    return pagerank;
}

Modified (to Java and added normalization etc.) from idea and original code by Jack Tang.

Discussion

chaitu
January 5, 2010: 10:25 pm

can anybody tell me how to implement page ranking algorithm
chaitu shukla


LLama
March 19, 2009: 12:42 pm

Any string comparison done in Java with == will make the expression false (if you use Strings defined in your sourcecode)

YOUR VIEW POINT
NAME : (REQUIRED)
MAIL : (REQUIRED)
will not be displayed
WEBSITE : (OPTIONAL)
YOUR
COMMENT :