Reply to this topicStart new topic
> Yahoo Indexing only 2 Pages of Large Site?

Untested

Group: Members
Joined: 3-April 05
Posts: 8
post Apr 3 2005, 12:35 AM
Yahoo! shows 2 results pages when doing the query:

site:www.dealsonhotels.com

One of the results is a directory listing. This is a large site that has been live for well over a year. The same command with Google gives over 4500 results. I'm wondering if anyone knows why Yahoo doesn't seem to be indexing the other pages as well.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Apr 3 2005, 09:36 PM
Hi Emory,

There's been a major update in Yahoo! over the last few days. Have you been getting these types of results on Yahoo! for the site longer than that?

There are some differences between the way that Yahoo! indexes sites, and Google does, and how they list pages.

I've heard that Yahoo! starts having problems with indexing pages that appear to be too many directory levels deep (even if someone might be using rewriting to emulate a directory structure.) That statement was made by Tim Mayer of Yahoo at last months NY Search Engines Strategies sessions.

Another potential problem may be a lack of unique page titles on these pages. Even Google only shows a couple of pages of results on that query, since they can't differentiate between the pages on the basis of things like unique titles.

Are there possibly other factors on the pages that may influence how spiderable they are? Maybe.

Session IDs can also cause potential problems with indexing.
Offline Go to the top of the page

Moderator

Group Icon
Group: Moderators
Joined: 20-August 03
Posts: 1,248
From: New York
post Apr 4 2005, 07:38 AM
Your not the only one, there are now many sites that only have two pages indexed. I don't know more at the moment, but I feel it has to do with a duplicate content issue. But the sites are clearly not dups, so its on the Yahoo! side I believe.
Offline Go to the top of the page

Untested

Group: Members
Joined: 3-April 05
Posts: 8
post Apr 4 2005, 05:00 PM
Hi Guys,

I'm getting the exact same results as before the Yahoo! update. I can also rule out the rewritten/deep directory structure and lack of unique page titles. So, I'm still scratching my head and hoping that the problem is on the Yahoo! side cry.gif

BTW, bragadocchio, when I do that exact query on Google (site:www.dealsonhotels.com), I'm getting 4,460. Are you inputing in Google the same way?

Thanks.
Offline Go to the top of the page

Moderator

Group Icon
Group: Moderators
Joined: 6-March 03
Posts: 7,962
From: Langley, British Columbia, Canada
post Apr 4 2005, 05:57 PM
Welcome to the Forums, Emory. wavey.gif

Welcome also to the enigmas of search engines. The usual instruction to we men is "read the instructions" when we're having problems with some gadget or device. [I believe this is a sex-linked phenomenon, since it's never suggested that women don't read instructions or refuse to ask for directions.]

The reason why I raise this is that unfortunately search engines don't come with books of instructions, although they clearly should. We should not need to guess in a thread such as this.

The reason for this minor rant is that Yahoo! seems to have changed the rules without telling anyone. Here are the results for a small experiment I have just done. You will find that a Google search for site:www.dealsonhotels.com and for site:http://www.dealsonhotels.com give you exactly the same results. (Via Google.ca which Google insists on serving me, that's 4,140 web pages.)

It used to be the case for Yahoo! a few months ago, that the two searches would give you very different results. Without the http:// would always give you a finding of 2 web pages. With the http:// would always give you a much larger number of web pages (even larger than Google). Today the two still give you very different results. The latter now gives you a count of zero. I thought Yahoo! was doing better these days, but there you go, my faith is shattered.

Microsoft largely follows the Google way of doing things. The search for site:http://www.dealsonhotels.com gives you 2,104 web pages and the search for site:www.dealsonhotels.com gives you 2,098 web pages. Why the difference? Who knows. :?

Perhaps all this is a fruitful field of research for some budding Ph.D. student?
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Apr 4 2005, 07:12 PM
QUOTE
BTW, bragadocchio, when I do that exact query on Google (site:www.dealsonhotels.com), I'm getting 4,460. Are you inputing in Google the same way?


I am getting the same large number of pages that you are (actually, I'm getting 4,490), but I'll explain what I'm seeing.

I have my preferences set in Google to show 100 results on the page. When I scroll down to the bottom of the page in Google, and click on the number "2" to bring me the second hundred, it does. I scroll down again, and click on the number "3," and it presents some of the next three hundred. But not all of them. Instead, it tells me:

QUOTE
In order to show you the most relevant results, we have omitted some entries very similar to the 251 already displayed.
If you like, you can repeat the search with the omitted results included.


That tells me that Google thinks enough of what is included on those pages are similar to each other that they are willing to filter them out.

I'm also not seeing unique page titles for those pages when I look at them. For many of the event pages, I do see a unique heading at the top of the page, but the page title, what appears in the title bar of your browser, is the same from page to page. If those were each different and unique, it might make Google think that those pages might not be as similar as it does.

I don't know if Yahoo! is carrying that type of filtering to an extreme or not, by only showing a couple of pages. I suspect that it might be a problem on the Yahoo! side, but if you can programmatically make a change to the information between the <title> tags on the different events' pages without too much time or effort or money, it might be worth trying.

Google tends to like unique titles, and giving your pages unique titles might improve the relevancy of your many events pages for queries in Google, even if it doesn't assist you in Yahoo!.
Offline Go to the top of the page

Untested

Group: Members
Joined: 3-April 05
Posts: 8
post Apr 4 2005, 10:53 PM
Interesting comments! It seems like Yahoo! would publish somewhere what the parameters actually do, but as you say, Barry, no instruction manual.

bragadocchio, I was assuming that those event pages weren't indexed anyway because of the session id's. The links in the left column should be friendlier and have page titles.

I also noticed that if you put a space after "site: " that the results are different. Seems to include external links.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Apr 4 2005, 11:09 PM
That's a confusing aspect to the restricted searches that Yahoo offers. A "site" restricted search needs to be done without a space.

When you include a space between "site:" and the domain name, you aren't using a site restricted search anymore.

Instead, your search becomes a search for "site" and the domain name, and will include pages that include both the domain name and the word "site."

A page with a session ID can end up in a search engine index. But, it might be indexed multiple times within the index, without the search engine knowing that those pages with different URLs are the same pages. They do usually know that thos pages are similar enough to only display one in search results, and not others, in response to a query. But, with the session IDs, it's possible that a search engine might give up rather than index what seems to be an infinite amount of pages.

The pages in the left column follow an interesting naming convention (www.topicname.example.com), and it may be that the search engines don't interpret those pages as being part of the same site because of it.
Offline Go to the top of the page

Untested

Group: Members
Joined: 3-April 05
Posts: 8
post Apr 19 2005, 03:06 PM
Appreciate your remarks. Any thoughts on doing the Yahoo! query with the "www":

CODE
site:www.dealsonhotels.com


versus without:

CODE
site:dealsonhotels.com


Without the "www" the results are much more plentiful. I wonder what the difference is.
Offline Go to the top of the page

Moderator/Blog Editor

Group Icon
Group: Site Admin
Joined: 18-January 05
Posts: 5,375
From: Olympia WA, USA
post Apr 20 2005, 02:08 AM
Depending on how a server is set up, yoursite.com may be used instead of www.yoursite.com.

www.yoursite.com is the standard.

A search for site:www.yoursite.com should find where the site is referred to as www.yoursite.com.

site:yoursite.com should find both yoursite.com and www.yoursite.com.

Think of yoursite.com as a subset of www.yoursite.com.
Both www.yoursite.com and yoursite.com contain yoursite.com.
Only www.yoursite.com has the www.

That said, nothing is exact, :-) even among search engines.


Elizabeth
Offline Go to the top of the page

Untested

Group: Members
Joined: 17-May 05
Posts: 1
post May 17 2005, 01:43 PM
The site uses a lot of sub domains (example.dealsonhotels.com) You will need to check your domain without the www. to find these listings. The problem is results from domains like www.example-dealsonhotels.com might also show up in the count.

--The Griz
Offline Go to the top of the page

Moderator

Group Icon
Group: Moderators
Joined: 20-August 03
Posts: 1,248
From: New York
post May 17 2005, 04:52 PM
Any update on this? I have heard of many cased of dynamic sites with just two indexed pages in Yahoo.
Offline Go to the top of the page

Moderator/Blog Editor

Group Icon
Group: Site Admin
Joined: 18-January 05
Posts: 5,375
From: Olympia WA, USA
post May 17 2005, 05:56 PM
This may be a special case.

Sites named www.city.dealsonhotels.com have near duplicate content as www.dealsoncityhotels.com.

Compare these two:
www.cancun.dealsonhotels.com
www.dealsoncancunhotels.com

The same is true for the other subdomains (of which there are MANY) that show up with a search for site:dealsonhotels.com.

I haven't checked to see if they are all within similar octets.

Try site:cancun.dealsonhotels.com/
Google has seen them but chosen not to cache most. The caches are from multiple dates from late April through May 14th, so it's not like Google is not looking.

Most (at least) of the descriptive text is identical:
Find Hotel deals and discounts for your hotel reservations around the world.
Featuring Cancun destinations, airport hotels, and the lowest internet rates,


Elizabeth
Offline Go to the top of the page
Reply to this topic Start new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 03:16 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed