Page Rank and Vote

Introduction

Under a standard Google algorithm, for a WWW consisting of n pages, they contribute to a hyperlink matrix:

, where aij equals if page i has a link to page j and 0 otherwise, and L(i) is the total number of outboard link from page i. Here, duplicated outboard link to the same pages only counts one.

Very often, there are some Web files like image and mp3 that do not have a link to any other pages. Such sitesare called dangling nodes, where aij = 0 for all j. i.e. the ith row is all zero entries. To fix this, we may replace the entire ith row by all ’s.

To facilitate our discussion, consider a WWW consists only 4 pages, page 1 to 4.

Here, 1 links to 2, 2 links to 3, 3 links to both 1 and 4, and page 4 is a dangling node. So we have:

and new matrix

Hence, no matter how a fixed matrix S is in general, each row must sums to 1, which makes it a row stochastic matrix.

However, to get closer to a realistic internet user, Google deploy the matrix

Where 1 is the column vector with all entries one, v is a row vector called personalization vector, and α is the damping factor taking value from 0 to 1, indicating the probability α that the random internet surfer follow the model matrix S and 1-α going to other pages by some means other than S.

Several experiments have revealed that the optimized parameters are α=0.85 and .

To compute the ranking, since G is row stochastic, we will get a stationary row vector , and it is called Pagerank vector.

In Google’s setting, aij’s are discrete numbers .

In the following discussion, we have a less restriction on aij’s. We allow , which means the pages can have a weighted voting to other pages.

Consider there is originally 2 pages only in the WWW, which gives a 2x2 matrix S2:

Now consider a new page added to this WWW.

Here since the 3rd page is newly added, it is believed that the original pages do not have outboard links to this newcomer. So ai3=0 for i=1, 2.

So G3 becomes:

, where

So we have the following 2 proposition:

Proposition 1:

and the equality holds iff z=1

Proof:

By solving the third row of , we have:

Hence π3increase with z and attend a maximum 1/3 when z=1

Proposition 2:

Assume b>a, then .

Proof:

Solving the system ,

We have:

From the first row, we have:

Substituting , we have:

Then, by , we have

The denominator of π1 and π2 are positive for α not equal to 1.

Consider the difference between the numerators of π1 and π2, i.e.

]

Now, for

Hence,

Hence, if

And to make the value of as large as possible, obviously, take y=0 is essential.

Then z=1-x and we let and differentiate f(x) w.r.t x,

Hence attend its maximum at x=1 and

So, f(1) decreases with α. Here we can see that, it α decrease, the random user’s behavior will depend less on the hyperlink matrix. Hence it make sense that the range of b-a can be larger remaining page 1 has a page rank higher that page 2.

Now, we extend our discussion to a more general case.

Consider there are originally n pages in WWW. Hence,

Where aij is the weighted link from page i to page j, satisfying that

Now we add one new page. So the new Hyperlink matrix is

Hence by

Solving the equation we have:

In particular,

Hence, increases with z and attend a maximum of when z=1.

Now, when we compare the two situations:

And consider the changes in π for j=1,2,…,n: