Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

Monetizing Pdf's


  • Please log in to reply
25 replies to this topic

#1 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 14 June 2007 - 10:42 PM

On one of my websites I have quite a few pdfs which are now attracting considerable traffic from Google. Unfortunately they are attracting the traffic straight to the pdf file.

This makes it impossible to monetize these with adsense (unless I change all of them to html!). Any suggestions how to deal with these?

Yannis

#2 AbleReach

AbleReach

    Peacekeeper Administrator

  • Site Administrators
  • 6467 posts

Posted 14 June 2007 - 10:59 PM

Change them to html - as you suspected. :-)

And/or, charge for some as an ebook.

#3 tam

tam

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 2060 posts

Posted 14 June 2007 - 11:18 PM

Should be able to do a bit of wizzardry to redirect them at a html page (with adsense or pay to view). I know google indexes a lot of pdf journal articles that redirect you to a pay page if your not an SE.

Tam

#4 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 14 June 2007 - 11:31 PM

Thanks Pam and Elizabeth

And/or, charge for some as an ebook.


Thought about it only giving one or two chapters free and sell them as e-books. The problem that I see then traffic would go down as the majority of the chapters will not then be indexed. It is a good idea though.

Should be able to do a bit of wizzardry to redirect them at a html page (with adsense or pay to view).


Any ideas how to redirect and not redirect the bots? Would this be cloaking?

Yannis

#5 Respree

Respree

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 5901 posts

Posted 14 June 2007 - 11:51 PM

I'm curious why you chose pdf over html. Was there a strategic reason?

#6 BillSlawski

BillSlawski

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15644 posts

Posted 15 June 2007 - 12:03 AM

Any ideas how to redirect and not redirect the bots? Would this be cloaking?


It would be making a decision to use a different format, but show the same content, without an intent to deceive.

If you wanted to change the format of these documents to HTML, you would change your own internal links to HTML versions of the pages, and redirect the external links to the HTML pages so that you don't lose the traffic that was going to the PDF pages. You would be best served by using permanent (301) redirects. Since your content wouldn't change, you wouldn't be deceiving anyone.

Another issue with the use of PDF files, is that by their nature, they are considered "dangling nodes" since they don't usually have links from them that search engines can follow, so the benefit of pagerank or link popularity from links pointing to them may not be calculated in the same way that it may be for an HTML page (it is hard to tell exactly how the commercial search engines address that issue), and you don't get the benefit of links from them to other pages on your site.

So, in addition to placing ads on those pages, you also get the benefit of being able to place links on those pages to other pages of your site. The other good thing about that is that people visiting one of those pages may also be more likely to visit more pages on your site.

#7 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 15 June 2007 - 12:17 AM

I'm curious why you chose pdf over html. Was there a strategic reason?


I had hundreds of pages originally on our intranet with highly original and unique content. It was a great way to add content quickly.

I also noticed that people that looked for technical information would actually search for filetype:pdf as it was likely to return real information rather than snippets from someone's blog.

It was also a great way to find out how google deals with such pages. My impression they got indexed heavily with some very strange and weird long tail phrases showing up! My conclusion actually is that google LOVES pdfs!!

Thanks Bill for your reply. Your suggestions are good. Do you think it will be duplicate, if I include a link on the HTML page, download this as pdf with a no-follow? (I guess I will need two sets for the pdf file, one to leave as is to enable the redirection and leave it in the index and the second one for the download)

Thanks

Yannis

#8 BillSlawski

BillSlawski

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15644 posts

Posted 15 June 2007 - 12:41 AM

You're welcome, Yannis

Do you think it will be duplicate, if I include a link on the HTML page, download this as pdf with a no-follow? (I guess I will need two sets for the pdf file, one to leave as is to enable the redirection and leave it in the index and the second one for the download)


One of the best reasons to use the PDF format is to provide the document in a form that doesn't change, so that when people print out the document, it looks the same regardless of what operating system they are using, or what printer.

If you wanted to offer the PDFs, and avoid any possible duplicate content issues, you could put the PDFs in a directory that is disallowed to robots in your robots.txt file. Make sure that you create the disallow statement first and then wait at least a couple of days before creating the directory and placing the files within it and linking to the files - the major search engines cache a copy of the robots.txt file, rather than checking it before crawling each link (bandwidth issues make this the preferred approach for them).

This is much better than using a "nofollow" value in a rel attribute. The "nofollow" value has a different connotation and meaning than a nofollow in a meta tag. The meta tag nofollow means don't visit the page because you don't want the search engine to visit it. The nofollow value in the rel attribute means don't impart pagerank value on the page because you don't trust the source. I wish Google had come up with a different name. I would never ever use nofollow on an internal link on my own site.

I'm not sure that Google loves pdfs. I think that instead, people love PDFs that provide great information, and will readily link to them if they find them valuable. Another limitation of PDFs is that you can't use some of the indications of emphasis such as headings and other formatting that might tell a search engine that some words in the document are more important that others - and might use to determine which words are more relevant that others within those documents.

#9 tam

tam

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 2060 posts

Posted 15 June 2007 - 01:14 AM

Any ideas how to redirect and not redirect the bots? Would this be cloaking?


I presume this must be done through an htaccess file?

A lot of journal sites work this way e.g. if you search for "Spatial patterns in European rabbit" the first link is a PDF but click on it and you'll be redirected to a html page with the abstract etc. and a purchase option.

Another option would be instead of selling individual pdf's charge a membership fee to access all the documents.

Could you mix n match? Provide a html page with enough of the content - a write up or first chapter - to attract the visitors and then charge for the complete PDF.

Maybe the best approach is to think about what you want to provide for free and what you want to charge for and then work out formats :)

#10 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 15 June 2007 - 01:33 AM

Don't do redirects for the users and not the bots. Yes, it is seen as cloaking and it can (and often will) result in a penalty (or more - at least by Google). Remember their Googlebar: it shows Google exactly which URLs are being accessed. It's no big thing to have them compare the links handed out in the search engine to the URLs where the user lands after say 1-2 seconds. I wouldn't take that risk.

I think your PDFs are a great way to attract targeted traffic. If I am looking for research or technical material, I will often trust a PDF in the search results more than a web page. Like you mentioned - a web-page is often "just" some blog discussing something, no real content that is worth-while. A PDF however is something that is made to be printed out, a reference that will hold up for a long time. Usually you can be fairly certain that when a PDF is put online, it will have already gone through an extensive editorial process. A blog posting is put up 10 minutes before going to sleep and perhaps corrected a month later, in some other posting.

One thing you might do is shorten your PDF files (assuming they're large). A shorter file will target better in the index and you can include a link to the "next part" (which would be a link to a normal web page). That way interested people could find your files, use them, but they will also always have a reason to visit your web site for the rest of the information.

Another way might be to tie in additional resources from your website. Eg if your PDF is about designing bridges, you could add references to your site where you have more links, perhaps some online-calculation tools, other interesting files like spreadsheets, a forum, etc. Anyone seriously interested in the contents of your PDF file would then automatically be pulled in to your website as well.

I had a similar situation. Our company website was terrible with regards to SEO (still is a bit :P). All of our PDF files (mostly price lists or order forms) would rank above the normal content. Google had less trouble extracting context from the PDF files than from our obfuscated (code-wise) website - :).

John

#11 BillSlawski

BillSlawski

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15644 posts

Posted 15 June 2007 - 02:52 AM

Don't do redirects for the users and not the bots. Yes, it is seen as cloaking and it can (and often will) result in a penalty (or more - at least by Google).


If the content that was on a PDF is moved to an HTML page, and a 301 redirect is used to point to the new address, that is not cloaking. There's no intent to deceive, to mislead, or to manipulate the search results. it's the same content at a different page.

If a 302 is used, and the original PDF remains in place, with an HTML abstract and a fee for accessing the full paper or article, then yes that could be construed as cloaking and could result in a penalty.

John,

I understand your preference for a PDF file from an academic stance, since so many academic papers are published as PDF files. I wish that they weren't. I don't see any benefit, and I don't believe that there is any indexing benefit whatsoever from publishing a document as a PDF.

It's hard to tell how commercial search engines handle PDF files, but because most don't contain links that a search engine can follow, there is a potential harm to the indexability of a page because of its nature as a dangling node. There's some discussion of that in Deeper Inside Pagerank:

We return to the issue of dangling nodes now, this time discussing their philosophical complications.
In one of their early papers [25], Brin and Page report that they “often remove dangling nodes during the
computation of PageRank, then add them back in after the PageRanks have converged.” From this vague
statement it is hard to say exactly how Brin and Page were computing PageRank. But, we are certain
that the removal of dangling nodes is not a fair procedure. Some dangling nodes should receive high
PageRank. For example, a very authoritative pdf file could have many inlinks from respected sources,
and thus, should receive a high PageRank. Simply removing the dangling nodes biases the PageRank
vector unjustly. In fact, doing the opposite and incorporating dangling nodes adds little computational
effort (see equation (1)), and further, can have a beneficial effect as it can lead to more efficient and
accurate computation of PageRank. (See [79] and the next section.)



#12 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 15 June 2007 - 03:18 AM

Just to be clear - the kind of redirect I would be worried about would be a conditional redirect:
- search engine crawlers can access the PDF directly
- users coming in from the search engines are redirected (301 or 302 doesn't really matter) to a HTML page

A clean 301-redirect for all clients (search engine crawlers and users) to a HTML page would be no problem at all. It would just result in the PDF being removed from the index and replaced with the URL to the HTML file.

I understand your worries about the "dangling node" PR problem, but from what I have seen they (Google mostly) are handling these kinds of files fairly well. I am not sure how they actually index the content - perhaps they take the HTML version (which they generate) and use that as a basis for the crawl. You could probably test that and even check to see how headers and links are treated. My guess is that their PDF conversion is fairly advanced, hence the high rankings of some PDF files. It would be interesting to test some of that :).

Are there similar file formats that are indexed? Text-files perhaps? I doubt many people would explicitly search for textual content within flash files..

John

#13 BillSlawski

BillSlawski

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15644 posts

Posted 15 June 2007 - 03:34 AM

Just to be clear - the kind of redirect I would be worried about would be a conditional redirect:


Right. The way that you had written your response may not have been clear, which is why I followed up.

I am not sure how they actually index the content - perhaps they take the HTML version (which they generate) and use that as a basis for the crawl.



It's the uncertainty that bugs me the most. I have somewhat of an idea of how they might be indexing an HTML file (only the folks at Google know for certain, and then again no one person may have the whole picture). But, with a PDF, we just really don't know.

And we don't usually have links from the PDF to other pages on the site. My biggest gripe with that isn't necessarily even related to search engines, but rather that someone has to try to edit the URL in the address bar if they are interested in seeing what else might be on the web site - something that a lot of people won't do.

#14 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 15 June 2007 - 03:55 AM

It should be pretty easy and quick to test whether or not links are recognized in PDF files. I did notice that the HTML versions of the PDFs (as generated by Google) do contain HTML links if the PDF contained links. This at least shows us that they can recognize PDF links - perhaps they're even used for discovering new URLs (easy to test).

Examples:
With links
Without links

Headers appear to be a bit different, the HTML version only has the font size (and does not use contextual markup -- which would be complicated to determine).

How does Google handle links within text files?

John

#15 AbleReach

AbleReach

    Peacekeeper Administrator

  • Site Administrators
  • 6467 posts

Posted 15 June 2007 - 05:58 PM

I've seen Google pick up bold or heading-like text in pdfs, especially if it's in the first couple hundred words, a heading further down, or the link text of a referring page, but not necessarily. A few times I've searched for information about a family member and come up with a pdf result that had no link text referring to them (as far as I could find) and only mentioned them in a list waaaay down in the body of the pdf.

In my experience pdfs seem to drop out of the index more quickly as they age, and as the pages referring to them age, unless the referring page is an authority such as a newspaper. Methinks this would be a logical thing for a SE to aim for on purpose. If a pdf is meant to be a cohesive reference on something, even on what happened at last month's "friends of the little local library" meeting, would authoritative inlinks differentiate pdfs in a different way than they would other sorts of content? And, if you're developing yourself as a resource, would a linkable resource-like pdf that gets those links be a sign to the algos that your site is becoming successful as a recognized-by-authority resource-type site?

#16 phaithful

phaithful

    Light Speed Member

  • Members
  • 800 posts

Posted 15 June 2007 - 06:12 PM

Just a quick thought:

I've ofter read portions of reports from Jakob Nielsen's UseIt.com, find the content very interesting. Then I want to know more... so I then proceed to purchase the reports from their commerce site: http://www.nngroup.c...ports/intranet/

If the PDFs are similar in topic or nature, why not remove a small subset, and then package all your PDFs into 1 bundle as downloadable for a 1 time fee?

You could also modify the bottoms of your existing PDFs or reformat them to promote the bundle.

Edited by phaithful, 15 June 2007 - 06:13 PM.


#17 Jozian

Jozian

    Light Speed Member

  • Members
  • 583 posts

Posted 16 June 2007 - 10:59 PM

Some really nice ideas here. Thanks for the interesting discussion, Cre8 members.

I would probably do the redir to htm as discussed, if I was in your shoes, Yannis.

But it you haven't already, I would add some branding, contact and even cross-sell into to each PDF.

I would also make sure you lock the PDF files to discourage people from stealing and reselling your work. But you likely already know that if you are creating PDF's. Are you using the full Adobe product or a 3rd party tool?

Also, doubt you would be interested in this type of a solution, but pdf's can be generated server-side on the fly, if you want to keep anyone from accessing the file directly. Couple of solutions I have seen that do this.


bragadocchio wrote: ...has to try to edit the URL in the address bar if they are interested in seeing what else might be on the web site...

LOL. I can definitely see that happening.
-Jeff

Edited by Jozian, 16 June 2007 - 11:01 PM.


#18 EGOL

EGOL

    Professor

  • Hall Of Fame
  • 5483 posts

Posted 16 June 2007 - 11:15 PM

I don't know how much these would sell for and how that compares to the adesense yield of these books. However, I would be tempted to put them online in HTML pages for free viewing and hope to make adsense or other revenue from them. Free content like this is highly linkable, will rank for lots of KWs, and the links will drive you to higher rankings and higher traffic. If the content is removed you might not rank as well.

I don't try to sell content, I try to rank high with and and earn LOTS of traffic.

#19 AbleReach

AbleReach

    Peacekeeper Administrator

  • Site Administrators
  • 6467 posts

Posted 17 June 2007 - 12:47 AM

Depending on what you have, another alternative would be to make all pdfs into inline html AND offer a downloadable, ad-free, e-book style collection, possibly available for a nominal fee.

Then, you'd be looking at courting three kinds of inlinks - for the "home" of the resource, for individual pages and for the page where someone would download the collection.

Edited by AbleReach, 17 June 2007 - 12:50 AM.


#20 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 17 June 2007 - 09:06 AM

Thanks everyone for the discussion.

It should be pretty easy and quick to test whether or not links are recognized in PDF files. I did notice that the HTML versions of the PDFs (as generated by Google) do contain HTML links if the PDF contained links. This at least shows us that they can recognize PDF links - perhaps they're even used for discovering new URLs (easy to test).


Google definitely picks up links within pdf files. As a number of pdfs I have described have one master index page and link to individual files. Google went to about four levels deep without any hesitation. I haven't tried it with external links though (only links linking other pdfs). My guess though is that it should have no trouble doing so.


don't know how much these would sell for and how that compares to the adesense yield of these books. However, I would be tempted to put them online in HTML pages for free viewing and hope to make adsense or other revenue from them. Free content like this is highly linkable, will rank for lots of KWs, and the links will drive you to higher rankings and higher traffic. If the content is removed you might not rank as well.


Egol, thanks for the reply, this is actually what I was afraid if I remove the pdf content. The beauty of these files are the traffic they are attracting and I can hear the consensus here to be rather translate them to html pages, optimize them further etc.

Jeff, thanks

I would also make sure you lock the PDF files to discourage people from stealing and reselling your work. But you likely already know that if you are creating PDF's. Are you using the full Adobe product or a 3rd party tool?


Thanks for the above the files are actually locked. You can still pick-up locked files but that is another story! We are using the full professional version.

Phaithfull thanks your idea is good of bundling them all in one or two e-books will probably be about six. What I am leaning though is to go with the adsense revenue module and have the budle as a sideline.

Elizabeth thanks for your thoughts. I need to meditate on what you wrote though which I quote below:

And, if you're developing yourself as a resource, would a linkable resource-like pdf that gets those links be a sign to the algos that your site is becoming successful as a recognized-by-authority resource-type site?


Finally Bill and John thanks for the cloaking insights and duplicate content issue pointers. I agree that one needs to be extra careful here.

Like anything SEO, I will tread slowly and steadily beginning with pdf to HTML translations, section by section and watch the stats carefully.

Yannis

#21 ambassador

ambassador

    Gravity Master Member

  • Members
  • 127 posts

Posted 07 July 2007 - 08:39 PM

Yes Yannis, PDFs can indeed contain SE-recognized links to external pages of most any type (e.g., HTML, XHTML, PHP, ASP, CFM, other PDFs, etc.). Further, the number of online PDFs containing such links appears to be increasing.

Ambassador

#22 half21back

half21back

    Unlurked Energy

  • Members
  • 7 posts

Posted 10 July 2007 - 11:51 AM

I wasn't able to read through all the posts, but my idea is to put links in these pdf ebooks in key places and reload them to your server. These links in turn should carry traffic back to your sites where you can monetize the adsense or whatever program you choose there.

#23 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 02 August 2007 - 02:42 AM

I'm not sure how you could leverage this, but you might want to take a look at the new "x-robot" http header tags that Google supports.

Sebastian-X has a short posting on how to easily use them through your .htaccess file. Perhaps setting the PDFs to "noarchive, nosnippet" could be a part of a strategy.

I wonder how "noarchive" would work on images? :D

John

#24 RonD

RonD

    Unlurked Energy

  • Members
  • 9 posts

Posted 02 August 2007 - 10:06 PM

One little additional factoid about PDFs. Not sure how representative our data is, but in our online tech support service, we get just a ton of questions on adobe - people can't download the latest version, or wondering if it is compatible....I've wondered if there are issues there on the user side for using a PDF as well, how prevelant adobe issues are in the reader community.

On the other hand, a buddy of mine is at a company (scene7) that adobe just bought to enhance their offerings to ecommerce and other vendors that want to display their goods nicely on websites...so adobe must have some cash and be doing ok!!

Ron

#25 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 03 August 2007 - 02:05 AM

Just wanted to mention a way NOT to monetize PDFs :)

http://blogsci.com/r...ers-as-spammers - they are showing the PDFs to the search engine and confronting visitors with a "buy me" page (for prices around $50). That kind of action goes against the Google Webmaster Guidelines (and probably the other engines as well) and could result in them getting their whole sites banned.

From the looks of it, this cloaking / "conditional auto-redirecting" :D is being done by the creator of the CMS software that these publishers use. For all we know, the publisher might not even know that it's being done or what the consequences could be of it being discovered by the search engines.

John

#26 EGOL

EGOL

    Professor

  • Hall Of Fame
  • 5483 posts

Posted 20 May 2011 - 07:54 PM

old, old thread... but...

I just discovered that if you have a shopping cart that provides links that place items into the cart you can embed buy-buttons right in the .pdf document.



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users