![]() ![]() |
Founder & Administrator![]() Group: Admin - Top Level
Joined: 29-August 02
Posts: 11,643
From: Bucks County, PA
|
Feb 14 2005, 12:28 PM |
|
|
So much has changed with Google since several papers analyzing how it works and how page rank is figured first came out. Several papers were poured over by SEO's and used to help them in their work.
One paper has been thrust back into discussion because the author has noticed changes. His comments sparked a response by Michael Martinez in: A rebuttal of Phil Craven's "Google Explained" Michael Martinez writes: QUOTE What people in the SEO community have now convinced themselves of is that a page \"bleeds\" PageRank, which is utter nonsense. QUOTE Remember, Google wants to approximate user behavior. They don't just want to create an algorithm that is in conflict with reality. QUOTE His analysis is flawed, as are all the others he refers to, and many more that I have read. The chief problem with all these analytical papers is that they assume or arbitrate a closed system to preserve a PR average of 1.0. Google isn't doing that. The Web is an open system, not a closed system. Hence, any closed-system model will diverge from Google's practical application.
There are other significant problems with these analyses, but I will emphasize one point that Google has made on more than one occasion (in their own words): the Toolbar PageRank has no direct correlation to the link popularity PageRank, and all these analyses continue to draw upon or try to resolve to that fallacious assumption. A site's overall importance is not gauged simply by how many pages link to it. That was a nonsense assumption that Page and Brin relied upon in their first model, but they got burned very badly by the link farmers. All subsequent determinations of \"importance\" have incorporated other factors. It's a long post, but well worth reading. He covers a lot of ground. And will likely take some heat for it by the authors of the papers he's referring to. |
||
| Offline | ![]() |
Star MemberGroup: Members
Joined: 24-February 05
Posts: 517
|
Feb 24 2005, 02:01 AM |
|
|
From the original Page/Brin paper, "Anatomy of a Search Engine":
QUOTE 2.1.1 Description of PageRank Calculation
Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page's importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows: We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one. Emphasis is mine. PageRank is a probability distribution. That means it can never average to 1 (except in a collection with only 1 document). Let's use some round numbers to demonstrate what they are talking about. Google currently indexes over 8,000,000,000 pages, so we'll just work with 8,000,000,000. An unadjusted, evenly distributed PageRank for any randomly selected document would start out with 1 divided by 8,000,000,000. That is an EXTREMELY SMALL number. You then process the iterations, counting all the links, applying the standard damping factor of .15 (not allowing for any arbitrary determinations of importance), and you'll never come up with an average of 1.0. Try it with 10 documents. They all start out with a PageRank of 0.1. Assume they all link to one other document. You're still not going to get an average of 1. It's mathematically impossible. So, no, there is no disconnect in what I say there at all. I'm just going by what Messrs. Page and Brin have to say on the subject, and they are the only published authorities. I'll take their word for it. Of course, they also say that the damping factor "can be set between 0 and 1. We usually set it to 0.85". That was then, of course, but "usually" implies "sometimes we set it to something else". Setting the damping factor to something OTHER than .85 can adjust a document's calculated PageRank. It still won't produce an average PageRank of 1.0 across the database. Nowhere did I assert that the damping factor would not or need not be applied to all documents. All I have pointed out is that the damping factor is adjustable and that there is no basis for assuming that it has never been adjusted for anything. However, Page and Brin DID say: QUOTE ...And, the d damping factor is the probability at each page the \"random surfer\" will get bored and request another random page. One important variation is to only add the damping factor d to a single page, or a group of pages.... So, in fact, the idea that a damping factor might NOT be applied to some pages comes straight from them, not from me. I call it a "terminating probability", you say something about typing in another URL. The difference in language is trivial. We are saying the same thing. A great deal of nonsense concerning PageRank has been passed around for years. It's probably never going to be correctly understood by anyone outside of Google's staff, because only they know how it is currently implemented. But I have seen enough analyses of PageRank which confuse the ToolBar PageRank (measured from 0 to 10) with the link popularity PageRank (which is always between 0 and 1, not inclusive of 1) to know that most of the self-styled experts who comment on it don't know what they are talking about. There is no average of 1 to preserve or converge toward. That is mathematically impossible. |
||
| Offline | ![]() |
Moderator![]() Group: Moderators
Joined: 6-March 03
Posts: 7,962
From: Langley, British Columbia, Canada
|
Feb 24 2005, 05:40 PM |
|
|
I see Mike Grehan seems to be talking with others who say PageRank is no longer a part of the Google algorithm.
|
||
| Offline | ![]() |
Star Member![]() ![]() Group: 1000 Post Club
Joined: 15-August 04
Posts: 1,071
|
Feb 26 2005, 03:41 AM |
|
|
QUOTE(Michael) By definition, PageRank only applies to the links. I think what you're referring to as PageRank is their actual search results ranking (or ordering) algorithm. No, I was talking about PageRank. Google's Technology Overview page, mentions "... PageRank performs an objective measurement of the importance of web pages by solving an equation of more than 500 million variables and 2 billion terms." What do they mean by '500 million variables' and '2 billion terms'? My first assumption when I read 2 billion terms was that it perhaps stood for the 2 billion pages in their index at that time)... Any ideas on what these mean? (Oh, by the way, me not into maths ... :) |
||
| Offline | ![]() |
![]()
|
|
| Lo-Fi Version | Time is now: 9th February 2010 - 03:16 PM |
| Meet our Moderators: | cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB |