From the original Page/Brin paper, "Anatomy of a Search Engine"
2.1.1 Description of PageRank Calculation
Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page's importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows:
We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.
Emphasis is mine. PageRank is a probability distribution. That means it can never average to 1 (except in a collection with only 1 document).
Let's use some round numbers to demonstrate what they are talking about. Google currently indexes over 8,000,000,000 pages, so we'll just work with 8,000,000,000.
An unadjusted, evenly distributed PageRank for any randomly selected document would start out with 1 divided by 8,000,000,000. That is an EXTREMELY SMALL number. You then process the iterations, counting all the links, applying the standard damping factor of .15 (not allowing for any arbitrary determinations of importance), and you'll never come up with an average of 1.0.
Try it with 10 documents. They all start out with a PageRank of 0.1. Assume they all link to one other document. You're still not going to get an average of 1. It's mathematically impossible.
So, no, there is no disconnect in what I say there at all. I'm just going by what Messrs. Page and Brin have to say on the subject, and they are the only published authorities. I'll take their word for it.
Of course, they also say that the damping factor "can be set between 0 and 1. We usually set it to 0.85". That was then, of course, but "usually" implies "sometimes we set it to something else". Setting the damping factor to something OTHER than .85 can adjust a document's calculated PageRank.
It still won't produce an average PageRank of 1.0 across the database.
Nowhere did I assert that the damping factor would not or need not be applied to all documents. All I have pointed out is that the damping factor is adjustable and that there is no basis for assuming that it has never been adjusted for anything.
However, Page and Brin DID say:
...And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page. One important variation is to only add the damping factor d to a single page, or a group of pages....
So, in fact, the idea that a damping factor might NOT be applied to some pages comes straight from them, not from me. I call it a "terminating probability", you say something about typing in another URL. The difference in language is trivial. We are saying the same thing.
A great deal of nonsense concerning PageRank has been passed around for years. It's probably never going to be correctly understood by anyone outside of Google's staff, because only they know how it is currently implemented.
But I have seen enough analyses of PageRank which confuse the ToolBar PageRank (measured from 0 to 10) with the link popularity PageRank (which is always between 0 and 1, not inclusive of 1) to know that most of the self-styled experts who comment on it don't know what they are talking about.
There is no average of 1 to preserve or converge toward. That is mathematically impossible.