Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Pdf Files And Pagerank

  • Please log in to reply
8 replies to this topic

#1 GeoffreyF67


    Mach 1 Member

  • Members
  • 331 posts

Posted 18 January 2008 - 01:41 PM

Here are a few things to ponder...if you happen to know the answers to any of them, feel free to answer ;)

First, we know that Google will index PDF documents:


So if it indexes those documents...does it:

1. Follow the links within that PDF?
2. Pass PageRank juice through that PDF?

Just some random things to think about :)


#2 Respree


    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 5901 posts

Posted 18 January 2008 - 01:56 PM

I'm guessing here (like everyone else), but would imagine that Google looks at web documents, text and links, without regard to what type of file it is. If find it hard to be believe that, in their minds, a .pdf would be worth any more or any less than an .html file, in terms of a vote. Just my very unscientific and unsupported gut feeling.

Great question, though.

#3 bwelford


    Peacekeeper Administrator

  • Hall Of Fame
  • 9053 posts

Posted 18 January 2008 - 02:16 PM

My very unscientific and unsupported gut feeling is not as bold as that of Respree. However I feel that the PDF file would give less linking value than an HTML file.

If you look at the source code for a PDF document it is not at all like a text file. So Google will have to be doing extra workarounds to grab the content. How much would they do? Only they know.

You might compare this with how well Google is handling links passed via JavaScript. JavaScript is much more readable. Of course there are many possible different ways links might be handled within any given script. So perhaps the PDF document is less of a challenge.

Until I hear to the contrary from some credible source, I will continue to assume that both JavaScript and PDF documents may or may not pass link juice and traffic. The conservative approach is to make sure that the same links are directly accessible via HTML to the Google spiders.

#4 GeoffreyF67


    Mach 1 Member

  • Members
  • 331 posts

Posted 18 January 2008 - 02:23 PM

Well, if you look at the link I provided above, there is also a View as HTML link on there for each of the PDF documents. This would lead me to believe that they're able to parse the PDF document pretty effectively. That doesn't mean, of course, that they're counting the links.

Sounds like a good test is needed. Anyone got a higher PR site that would link to a pdf document that links to a page that has no other links to it and see if that page gets indexed or not?


#5 yannis


    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 18 January 2008 - 02:34 PM

On one of my sites I have over 2000 pages of pdf documents that got indexed. I can vouch that Google indexes these files very well as I get some very long tail search engine traffic from these documents. They are also indexing links from one pdf file to another. This I can vouch as some of the above pdf files have a main file which leads to sub-files etc. My guess is that they are taken into the overall Pagerank calculation.

One other thing that I have noticed is that these documents are not indexed very regularly - which is natural - in that G probably guesses that pdf documents don't get updated too often.

They also index word files i.e rdf documents, code and many other things. Nothing escapes!


#6 bwelford


    Peacekeeper Administrator

  • Hall Of Fame
  • 9053 posts

Posted 18 January 2008 - 02:43 PM

Actually G-man, a few of those PDF files did not have an HTML equivalent. Does that indicate that their treatment of PDF documents is not 100% applied in all cases? It's always tough to prove that something always happens.

#7 GeoffreyF67


    Mach 1 Member

  • Members
  • 331 posts

Posted 18 January 2008 - 03:07 PM

Hmmm...fascinating. The reason that I'm asking this (in case you're curious) is that Scribd recently stopped giving links out. It used to be a nice way to get your page to rank was to create a scribd page. Now, since Scribd lets you upload documents AND you can have links within PDF files it seems like you could still get the links that you wanted if you used this method.





  • Hall Of Fame
  • 6374 posts

Posted 18 January 2008 - 03:16 PM

I have a site that has lots of external links coming into it from a large number of document types on a large number of other domains.

I went into Google Webmaster Tools and downloaded my backlink files. In it I see that google recognizes links to my site from various document types on many other domains.

I see lots of links from .pdf (Acrobat files), .xls (Excel files), .doc (Word documents), .ppt (PowerPoint presentations)

I publish files of all of these types on my own site. In each one I include a link back to my homepage. I believe that I get linkjuice from these when other domains link to them. Why not if google indexes the document and spiders the link to my site?

#9 iamlost


    The Wind Master

  • Site Administrators
  • 5474 posts

Posted 18 January 2008 - 04:00 PM

If it looks like a URL whether plain text, i.e. I really love cre8asiteforums.com, or normal clickable link Google will take note and attempt to follow. Including within at least any of the file types Google admits to parsing, i.e. pdf.

I believe that a couple of years ago Matt Cutts said (sorry unable to find reference) that Google can/may treat plain text links as backlinks. If a URL, linked or not, is treated as a backlink there is no reason for it to be unable to pass PR - but that is just my opinion.

The usual location for certain filetypes, i.e. pdf, is in some deadend repository receiving limited backlinks (there are notable exceptions), with very diluted trickledown PR and uncertain onpage factor oomph to add to none to few out-links (which does indicate a certain potential...).
Certainly testable.

RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users