Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

How To Figure Out The Number Of Pages On A Website?


  • Please log in to reply
5 replies to this topic

#1 DavidMTL

DavidMTL

    Ready To Fly Member

  • Members
  • 19 posts

Posted 30 July 2007 - 11:37 AM

Good monday morning to all,

I presume this is an easy question to answer, although I was wondering if anybody know of a tool that can count the number of pages on a website?

I am doing a competitive report for a friend of mine, and would like to get a clear report on the number of pages of a competitor's website. I know you can use the Yahoo! site explorer tool, but what about if some pages are not being indexed?

Thank you for reading my question :D

David

#2 joedolson

joedolson

    Eyes Like Hawk Moderator

  • Technical Administrators
  • 2902 posts

Posted 30 July 2007 - 11:48 AM

No, it's really not an easy question to answer.

In addition to the possibility of pages which are not being counted, you need to take into consideration the likelihood of pages which are counted more than once - this is one of the areas where duplicate content is most difficult to identify.

With small sites, you can pretty quickly identify the numbers of pages - but for large sites, the scope of problems can become quite unmanageable.

Consider this:

http://sample.com
http://sample.com/
http://sample.com/index.html
http://www.sample.com
http://www.sample.com/
http://www.sample.com/index.html
https://sample.com
https://sample.com/
https://sample.com/index.html
https://www.sample.com
https://www.sample.com/
https://www.sample.com/index.html

All of these may point to the same web page: and it's possible (although not actually too likely) that a search engine can have indexed every one of them separately and provide a page count increment for each of them.

Ultimately, it's very difficult to get a precise number: instead, you have to depend on approximate numbers. I'd suggest using a crawling tool like Xenu - although you'll possibly want to slow down the crawlers fairly substantially in order to keep them from blocking you.

#3 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 30 July 2007 - 01:03 PM

If a robot cannot reach a page its quite likely that access is restricted via passwords etc and hence you will not be able to count the number of pages in a website accurately. Spider and download tools can be used to count and filter links to get a fairly accurate estimate. Even better you can use a tool such as htttrack to download the whole website! (Use with care!). This will not only give you all the statistics but also provide competitive information.

Yannis

#4 Ruud

Ruud

    Hall of Fame

  • Hall Of Fame
  • 4887 posts

Posted 30 July 2007 - 01:06 PM

I am doing a competitive report for a friend of mine, and would like to get a clear report on the number of pages of a competitor's website.


What Joe said :D

To add to that, I'm not sure what kind of competitive insight that would give you.

First, the number indexed pages for a site is more interesting to me than the number of pages on a site. I'm not competing against pages which aren't indexed... For most sites the rough estimate you can arrive at by looking at the number of pages indexed is "good enough".

Second, unless you're comparing product sites and want to be able to say "you sell 10 products and he probably sells around 100 products on his site" -- what good does the information do you?

Which leads me to my third point; what are you comparing? By comparing straight numbers you're more than likely comparing apples with genetically manipulated turkeys. Because the fact is that your friend's hugely popular article "On The Art Of Growing Grass In A Coffee Cup" might bring in more traffic, more links, more conversions than 100 crappy or less popular articles on the competitor's site.

This is true even when you do compare 1-on-1. The fact that 2 sites each list the same 100 items means nothing. It is a quantitative measure. What you are interested in is the quality of each of those pages. How solid is the provided information? How juicy are the descriptions? How effective is upsell and cross-sell handled? How are non-buyers lead to other items they might buy? How does the pure ecstatic almost stupefying joy of using your site stack up to using theirs?

Competition is in quality and service more than in absolute numbers.

#5 sebastienbillard

sebastienbillard

    Whirl Wind Member

  • Members
  • 95 posts

Posted 31 July 2007 - 04:08 AM

This tool is IMHO more useful than Xenu, as it allows to exclude images from crawling : http://www.auditmypc...xml-sitemap.asp It can also respect (or not) the robots.txt protocol.

#6 DavidMTL

DavidMTL

    Ready To Fly Member

  • Members
  • 19 posts

Posted 01 August 2007 - 08:18 PM

Thanks guys, those are quality answers. I was looking for a % in terms of indexed content for comparisons purposes, and I knew it would be difficult to obtain an accurate number. At least I can use theses tools for an estimation.

Merci Ruud, très belle réponse.



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users