Yahoo! shows 2 results pages when doing the query:
site:www.dealsonhotels.com
One of the results is a directory listing. This is a large site that has been live for well over a year. The same command with Google gives over 4500 results. I'm wondering if anyone knows why Yahoo doesn't seem to be indexing the other pages as well.
There's been a major update in Yahoo! over the last few days. Have you been getting these types of results on Yahoo! for the site longer than that?
There are some differences between the way that Yahoo! indexes sites, and Google does, and how they list pages.
I've heard that Yahoo! starts having problems with indexing pages that appear to be too many directory levels deep (even if someone might be using rewriting to emulate a directory structure.) That statement was made by Tim Mayer of Yahoo at last months NY Search Engines Strategies sessions.
Another potential problem may be a lack of unique page titles on these pages. Even Google only shows a couple of pages of results on that query, since they can't differentiate between the pages on the basis of things like unique titles.
Are there possibly other factors on the pages that may influence how spiderable they are? Maybe.
Session IDs can also cause potential problems with indexing.
Your not the only one, there are now many sites that only have two pages indexed. I don't know more at the moment, but I feel it has to do with a duplicate content issue. But the sites are clearly not dups, so its on the Yahoo! side I believe.
I'm getting the exact same results as before the Yahoo! update. I can also rule out the rewritten/deep directory structure and lack of unique page titles. So, I'm still scratching my head and hoping that the problem is on the Yahoo! side
BTW, bragadocchio, when I do that exact query on Google (site:www.dealsonhotels.com), I'm getting 4,460. Are you inputing in Google the same way?
Welcome also to the enigmas of search engines. The usual instruction to we men is "read the instructions" when we're having problems with some gadget or device. [I believe this is a sex-linked phenomenon, since it's never suggested that women don't read instructions or refuse to ask for directions.]
The reason why I raise this is that unfortunately search engines don't come with books of instructions, although they clearly should. We should not need to guess in a thread such as this.
The reason for this minor rant is that Yahoo! seems to have changed the rules without telling anyone. Here are the results for a small experiment I have just done. You will find that a Google search for site:www.dealsonhotels.com and for site:http://www.dealsonhotels.com give you exactly the same results. (Via Google.ca which Google insists on serving me, that's 4,140 web pages.)
It used to be the case for Yahoo! a few months ago, that the two searches would give you very different results. Without the http:// would always give you a finding of 2 web pages. With the http:// would always give you a much larger number of web pages (even larger than Google). Today the two still give you very different results. The latter now gives you a count of zero. I thought Yahoo! was doing better these days, but there you go, my faith is shattered.
Microsoft largely follows the Google way of doing things. The search for site:http://www.dealsonhotels.com gives you 2,104 web pages and the search for site:www.dealsonhotels.com gives you 2,098 web pages. Why the difference? Who knows. :?
Perhaps all this is a fruitful field of research for some budding Ph.D. student?
BTW, bragadocchio, when I do that exact query on Google (site:www.dealsonhotels.com), I'm getting 4,460. Are you inputing in Google the same way?
I am getting the same large number of pages that you are (actually, I'm getting 4,490), but I'll explain what I'm seeing.
I have my preferences set in Google to show 100 results on the page. When I scroll down to the bottom of the page in Google, and click on the number "2" to bring me the second hundred, it does. I scroll down again, and click on the number "3," and it presents some of the next three hundred. But not all of them. Instead, it tells me:
QUOTE
In order to show you the most relevant results, we have omitted some entries very similar to the 251 already displayed.
If you like, you can repeat the search with the omitted results included.
That tells me that Google thinks enough of what is included on those pages are similar to each other that they are willing to filter them out.
I'm also not seeing unique page titles for those pages when I look at them. For many of the event pages, I do see a unique heading at the top of the page, but the page title, what appears in the title bar of your browser, is the same from page to page. If those were each different and unique, it might make Google think that those pages might not be as similar as it does.
I don't know if Yahoo! is carrying that type of filtering to an extreme or not, by only showing a couple of pages. I suspect that it might be a problem on the Yahoo! side, but if you can programmatically make a change to the information between the <title> tags on the different events' pages without too much time or effort or money, it might be worth trying.
Google tends to like unique titles, and giving your pages unique titles might improve the relevancy of your many events pages for queries in Google, even if it doesn't assist you in Yahoo!.
Interesting comments! It seems like Yahoo! would publish somewhere what the parameters actually do, but as you say, Barry, no instruction manual.
bragadocchio, I was assuming that those event pages weren't indexed anyway because of the session id's. The links in the left column should be friendlier and have page titles.
I also noticed that if you put a space after "site: " that the results are different. Seems to include external links.
That's a confusing aspect to the restricted searches that Yahoo offers. A "site" restricted search needs to be done without a space.
When you include a space between "site:" and the domain name, you aren't using a site restricted search anymore.
Instead, your search becomes a search for "site" and the domain name, and will include pages that include both the domain name and the word "site."
A page with a session ID can end up in a search engine index. But, it might be indexed multiple times within the index, without the search engine knowing that those pages with different URLs are the same pages. They do usually know that thos pages are similar enough to only display one in search results, and not others, in response to a query. But, with the session IDs, it's possible that a search engine might give up rather than index what seems to be an infinite amount of pages.
The pages in the left column follow an interesting naming convention (www.topicname.example.com), and it may be that the search engines don't interpret those pages as being part of the same site because of it.
The site uses a lot of sub domains (example.dealsonhotels.com) You will need to check your domain without the www. to find these listings. The problem is results from domains like www.example-dealsonhotels.com might also show up in the count.
Sites named www.city.dealsonhotels.com have near duplicate content as www.dealsoncityhotels.com.
Compare these two:
www.cancun.dealsonhotels.com
www.dealsoncancunhotels.com
The same is true for the other subdomains (of which there are MANY) that show up with a search for site:dealsonhotels.com.
I haven't checked to see if they are all within similar octets.
Try site:cancun.dealsonhotels.com/
Google has seen them but chosen not to cache most. The caches are from multiple dates from late April through May 14th, so it's not like Google is not looking.
Most (at least) of the descriptive text is identical:
Find Hotel deals and discounts for your hotel reservations around the world.
Featuring Cancun destinations, airport hotels, and the lowest internet rates,