Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

Do Search Engines like my-widget or my_widget.html best?


  • Please log in to reply
20 replies to this topic

#1 fast-pack.com

fast-pack.com

    Whirl Wind Member

  • Members
  • 67 posts

Posted 16 July 2005 - 04:28 PM

Hello All!

For years I have been using my_widget.html thinking it was best for the search engines. But seems like a while back I read that they prefer a - instead of a _ ? Anybody know? ....... and WHY?

Thanks!!

#2 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 16 July 2005 - 10:04 PM

I've pretty much stayed away from using multiple words in file names in an attempt to benefit from their use in indexing.

So they have value as far as search engines are concerned? I'm not sure. I can't say that I've ever seen a white paper, a patent, or a statement from the search engines or anyone working for the search engines that has suggested that file names play a part in indexing.

When creating file names for pages, I've tried to focus on making it easy for the person who administers a site to tell what the page is, when coming up with a name for a page.

I've seen some suggestions from people that an underscore is captured by search engines, unlike most other punctuation marks, and causes the words it separates to be melded together into one extra large term that doesn't rank well for the individual phrases. Supposedly, a hyphen doesn't have that effect. I've never attempted to test this, or to try to independently verify it.

Google does have an operator that will return results showing keywords in the URL for pages.

When I do a search for: inurl:search_engine, I do get results that only have underscores between the two words.

When I do a search for inurl:search engine (with a space between the words), I get results which show either of the words, or both of the words separated by hypens, or underscores, or periods.

When I do a search for inurl:search-engine, I see the words in URLs which either show the words separated by hyphens, plus signs, or periods.

Since the second result returned pages with URLs that had both hyphens and underscores, I'm not sure if there is much of a difference.

Does the existence of an inurl search tell us that Google uses that to determine relevancy for a page? I'm not really sure.

#3 iloveseo

iloveseo

    Ready To Fly Member

  • Members
  • 10 posts

Posted 18 July 2005 - 02:38 AM

I recommend my-widget.html . For Yahoo "_" is not a keyword delimiter. "-" is

#4 Michael_Martinez

Michael_Martinez

    Time Traveler Member

  • 1000 Post Club
  • 1354 posts

Posted 18 July 2005 - 09:37 AM

Google employees (I forget who) have been quoted as saying that underscores (_) are ignored by their indexing algorithm. Hyphens (-) are acceptable.

#5 Ron Carnell

Ron Carnell

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 2062 posts

Posted 18 July 2005 - 02:02 PM

If so, Michael, they were either misquoted or clearly wrong. Read Bill's post, do the searches, and it's obvious the underscores are not being ignored.

#6 Michael_Martinez

Michael_Martinez

    Time Traveler Member

  • 1000 Post Club
  • 1354 posts

Posted 18 July 2005 - 04:31 PM

I'll go with GoogleGuy on this one. I didn't have time earlier to do the search, but I've done it now.

Relevant Link

Posted: April 22, 2004 09:54 PM   
--------------------------------------------------------------------------------
Importance: High

Many people believe that you can use an underscore to separate_words_in_an_url, and that Google will recognize the words and count them for ranking purposes. I have read many debates about this: some people believe that Google indeed counts underscores as word separators, while other webmasters claim that you must use a hyphen or other character for Google to recognize the words as distinct. GoogleGuy has hinted in the past that hyphens are preferred. Recently, this debate reheated when Google made a change to it's "search word highlighting" feature in the search results: suddenly, keywords separated by underscores were being displayed in bold. The question then became whether the change was part of the ranking algorithm, or just the display mechanism.

Today, GoogleGuy finally confirmed this without any shadow of a doubt: "If you use an underscore '_' character, then Google will combine the two words on either side into one word. So bla.com/kw1_kw2.html wouldn't show up by itself for kw1 or kw2. You'd have to search for kw1_kw2 as a query term to bring up that page." [Link to Quote] I think we can put that debate to bed finally, at least for now.


I do not know if there has been a change in Google since April 2004.

So, I recommend the use of hyphens (versus underscores) for Google if people want words to be separated.


NOTE ON EDIT: On a few occasions, I think GoogleGuy has been shown to be out of step with what Google is actually doing. There is no indication in my research that his position on this matter has changed, but it's difficult to be thorough with GoogleGuy's pronouncements.

[Edit Just fixed the long link to eliminate a horizontal scroll - Ron]

#7 sanity

sanity

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 6889 posts

Posted 18 July 2005 - 04:57 PM

When creating file names for pages, I've tried to focus on making it easy for the person who administers a site to tell what the page is, when coming up with a name for a page.

Excellent advice Bill.

I don't mind having a few words in a filename if it makes logical sense. if it also happens to include some keyphrases well hey I'm happy. As with others here I always use a hyphen.

#8 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 19 July 2005 - 12:44 AM

Thanks, Sophie.

I've been thinking some more about this. I'm probably not going to change the way I name files, and start using hypens or underscores. But, I could see how hyphens might be treated differently than underscores by Google.


Michael,

Good point about the Googleguy quote.

If the file name does make a difference, it might be worth using. I've pretty much kept them as short as I could, and relevant to the content of the page so that I could find the page when using ftp.

In the grand scheme of things tha a search engine might pay attention to when determining what a page is about, might file names be helpful? Might they be something that the search engines use? Maybe.

Why would there be a difference between the way that they treat hypens and the way that they treat underscores?

The existence of the allinurl operator tells us that the file name is somethng that Google keeps in its index. But that doesn't tell us, on its own whether it plays a part in the relevancy ranking of a page.

We know that Google pays attention to file extension, at least when it comes to selecting files for its image search. That doesn't tell us anything about relevancy rankings either.

One of the recent Google patent applications which I posted about here describes how Google may treat hypenated words, and it seems that they may sometimes treat hyphenated words as if they belonged to the same query as pairs of words without hypens, or the same pair joined together, for example:

ice cream
ice-cream
icecream

The patent application was published at the end of June, but it was filed with the patent office on December 30, 2003. So, there is a possibility that it is in use.

It does show that Google may treat words separated by hypens as if they were comparable to unhyphenated versions of the same words, at least if there is a large enough sample of the alternative version.

So, if Google does consider file names in its relevancy rankings, there might be an argument made (based possibly in part on that patent application) that hyphenated versions of words, in some instances, may be treated like unhypenated versions of the same word pairs.

#9 Ron Carnell

Ron Carnell

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 2062 posts

Posted 19 July 2005 - 06:35 AM

LOL. I'd like to throw in a few observations based, not on hypothetical theory or hear-say, but on empirical evidence and experience. People can then, perhaps, arrive at their own conclusions. :)

* The software that bolds keywords in the SERPs is not the same software that determines the SERPs. It's a decent reflection of the ranking algorithm, I think, but only a reflection. It definitely is not "proof" of anything.

* Google, and in my opinion all of the search engines, do use the URL, including the file name, to help them determine ranking. I believe there are two sliding scales at work, however, based first on position and second on exactness. Position: a keyword in a domain counts more than a keyword in a folder, which in turn counts more than a keyword in a file name. Exactness: if the keyword phrase is precisely the same as a domain, the ranking boost can be fairly significant. This is more true on MSN and Yahoo, but true across the board. However, a partial match between keyword phrase and domain creates much, much less ranking effect, with the loss almost appearing to be logarithmic. Put these two sliding scales together, I believe, and you'll usually find that a file name has minor impact when it exactly matches a keyword phrase and almost no impact when it merely contains the keyword phrase. In my opinion, it's worth considering the impact, no matter how small, but it certainly isn't anything over which to obsess.

* For many, many phrases a word separator is not necessary. Google, at least, will find a surprisingly large number of words embedded in non-separated multiple words.

* The only non-alphanumerical character allowed in a domain name is the hyphen, and I think it's clear over the years that all of the major search engines have carried that significance over into the rest of the URI, as well. An underscore is not ignored, as Bill's example SERPs demonstrate, but it will only occasionally act as a word separator. A hyphen, I believe, does act as a word separator in every case.

Tangent: There's good reason, I think, to believe that Google treats most punctuation in the same way it has traditionally treated stop words. A stop word, of course, is one of those small, very common words that Google occasionally tells you it has ignored in your query. Funny thing is that it hasn't, in the past, really ignored it, but has rather tokenized it. In short, the stop words lose their differentiation, so that "a keyword," "an keyword," and "the keyword" become exactly the same query. That's the history, however stop words have been evolving over just the past few months, and you might have noticed that Google is much less apt to tell you it has ignored a word. Some of the tokenization is still taking place, but I'm no longer sure exactly how. At any rate, I think perhaps Google handles punctuation in much the same way, so that a comma, colon or underscore are essentially equivalent.

* Google's allinurl operator, and indeed all of its special operators, probably pulls from a different database than does a normal search. Create a page with eleven or more fantasy words in the <Title> tag, wait for it to be indexed, and then run an allintitle search for fantasy word number eleven. It won't be found. Run a regular search for that same fantasy word, however, and it will be found. The databases supported by the special operators are, I think, smaller subsets of the full database. As such, it's very difficult to use them to "prove" much of anything. Like the software that bolds keywords in the SERPs, we're just looking at a reflection.

For, uh, what it's worth ... :)

#10 Michael_Martinez

Michael_Martinez

    Time Traveler Member

  • 1000 Post Club
  • 1354 posts

Posted 19 July 2005 - 09:01 AM

I think I can agree with everything you said, Ron. I've noticed what seems like inconsistent results in Google with respect to similarly spelled domain names (I have run into a few distinct my-domain versus mydomain sites operated by different companies recently).

For peace of mind, I think people need to set a few rules for themselves, abide by those rules, and make necessary adjustments when their experience doesn't validate the rules any longer.

#11 Keywords

Keywords

    Ready To Fly Member

  • Members
  • 34 posts

Posted 19 July 2005 - 05:47 PM

We've tested dashes and underscores continuously for some time, the results are still the same. Underscores are part of the word, not a separator. Underscores are also very likely to be seen as spaces (by humans) when the URL is used in a hyperlink.

As a usability issue, that's on a par with putting keywords before the brand name in a page title. Jakob Nielsen says this makes it harder for users to find the page in their bookmark list, but I say that nobody uses bookmarks and putting keywords first in the title increases click-through from SERPs.

While I might thumb my nose at perfect usability when users prefer it (keywords in page titles), I will bow to usability when it comes to file names, and continue to use dashes.

#12 Big Bill

Big Bill

    Ready To Fly Member

  • Members
  • 30 posts

Posted 19 July 2005 - 06:37 PM

Only two years after the post and here I am - hyphens have it!

BB

#13 Ron Carnell

Ron Carnell

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 2062 posts

Posted 19 July 2005 - 07:13 PM

Underscores are also very likely to be seen as spaces (by humans) when the URL is used in a hyperlink.

... but I say that nobody uses bookmarks and putting keywords first in the title increases click-through from SERPs.

Good to see ya, Dan! ;)

Both excellent usability points. However, personally, I don't think you're thumbing your nose at usability when you put keywords at the beginning of the Title (though you might still be horribly guilty of thumbing your nose at Jakob Nielsen). I use bookmarks a lot, and there's nothing more irritating than bookmarking four or five pages from the same site and have them ALL show up as "ABC Widget Company ..." in my browser's sidebar. If the beginning of the Title is important (and we all know it is), then clearly each should be differentiated from every other Title on the site.

#14 Keywords

Keywords

    Ready To Fly Member

  • Members
  • 34 posts

Posted 19 July 2005 - 07:46 PM

Nielsen didn't write the Bible, just a couple books...

"Unique page titles" is one of those no-brainer, if-search-engines-didn't-exist, why-the-heck-doesn't-everyone-do-it, type o' things. Nothing like having 55 bookmarks for "Welcome!" ;)

#15 Black_Knight

Black_Knight

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 9339 posts

Posted 20 July 2005 - 02:44 AM

(though you might still be horribly guilty of thumbing your nose at Jakob Nielsen). I use bookmarks a lot, and there's nothing more irritating than bookmarking four or five pages from the same site and have them ALL show up as "ABC Widget Company ..." in my browser's sidebar.


Actually, Nielsen is the one who agrees with you:

Do not make all page titles start with the same word: they will be hard to differentiate when scanning a list. Move common markers toward the end of the line. For example, the title of this page is Microcontent: Headlines and Subject Lines (Alertbox).

Snippet fromhttp://www.useit.com/alertbox/980906.html



#16 ihics.com

ihics.com

    New To Community

  • Members
  • 2 posts

Posted 20 July 2005 - 06:14 AM

Greetings all,

Using "-" in file names (e.g. what-ever.html or some-other.jpg) may result in a 404 Errors (file not found) for Macintosh users (I believe it's a reserved character), whereas "_" is safe for Macs.

Just thought that might be worth mentioning :-)

#17 sanity

sanity

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 6889 posts

Posted 20 July 2005 - 04:51 PM

"Unique page titles" is one of those no-brainer, if-search-engines-didn't-exist, why-the-heck-doesn't-everyone-do-it, type o' things.

Good to see you Dan! You're so right. Long before SEO was around I was creating unique page titles for each page right from day one actually. These days they just have a few more keyphrases. :)

Welcome to Cre8 ihics.com :wave:

Interesting point. Does this happen a lot on Macs?

#18 ihics.com

ihics.com

    New To Community

  • Members
  • 2 posts

Posted 20 July 2005 - 08:35 PM

Hi Sophie,

A quick search with Dogpile reveals that from Mac OS 9 onwards the only illegal character for file and folder names is the colon : so the hyphen "-" problem would only apply to earlier Mac OSs.

I'm not fond of Macs myself, but have graphics industry associates who work with nothing else. On occasion, while developing HTML (HTA) based product catalogues, the use of hyphens in filenames (and/or URLs) has caused problems, both local (run from CD) and on-line.

Just another small transient item to be aware of.

Regards,

Peter

#19 sanity

sanity

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 6889 posts

Posted 20 July 2005 - 09:05 PM

Thanks for the info Peter.

BTW always nice to see another Aussie on board!

#20 projectphp

projectphp

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3935 posts

Posted 20 July 2005 - 09:44 PM

Apparently, the hypen also caused problems with a certain netscape point release. (4.63 from memory).

That said, those sorts of issues are hard to account for. We all want complete compatability, but in the extremely odd instances where a point release has problems, all one can do is just accept the problem only affects small numbers of people and move on.

#21 dan_888

dan_888

    New To Community

  • Members
  • 1 posts

Posted 27 July 2005 - 03:15 AM

Hi Guys,

Just thought I would mention Kelkoo (www.kelkoo.com owned by Yahoo) uses underscores and not hyphens in there page names.

Great forum!



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users