Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Weird Directory Structure

  • Please log in to reply
3 replies to this topic

#1 bobbb


    Sonic Boom Member

  • Hall Of Fame
  • 3346 posts

Posted 02 May 2012 - 01:29 AM

I recently started to maintain a website that has a weird directory structure. All the older articles are in a directory called index.htm (don't know why).

It puzzles me that Google does not index it. It has no problem with the images in directory index.htg. Is there some rule about directory names? A URL for this would be like: "www .DomainName. com/index.htm/PageName.html". Does that look spammy?

I've been checking this for about 3 months. Bing goes through it with no problem. G actually followed two in links and it did index those two pages so it can index that directory structure.

I have a sitemap.xml which it reads and all pages are listed. It's done about 50%. Maybe it's the number of pages? About 1200. Maybe it's just a time thing. Site up 7 months. Figured it may be because those URLs are all near the end of the file so I moved a few to the front. I even have one page linked to from the front page. It just keeps on visiting all the rest. Have no sitemap.html but all pages are reachable in 2 clicks from the front page.

I am getting traffic from a wide variety of search queries mainly from google.ca and google.com so they have no problem with the site.

#2 Michael_Martinez


    Time Traveler Member

  • 1000 Post Club
  • 1354 posts

Posted 02 May 2012 - 10:52 PM

I have seen some sites that use "index.htm/" because of their Content Management System. It's dumb but whoever wrote the code either didn't know what they were doing or else did not care. The search engines definitely don't care so there is some other reason for why you're not seeing the individual articles show up.

Have you looked under the hood? Is the site using relative URLs in the internal navigation? Are they easily resolved? Are the articles being canonicalized to other URLs? Is the site using parameter filtering in Webmaster Tools? How does Robots.Txt handle this directory? What about on-page meta directives, either for "robots" or for "google" or "googlebot"?

There are a lot of reasons why a page might not show up in one search engine but could appear in another. You have to create a checklist and go down it with a fine-toothed comb.

Or, if you have the option and the time and resources, you could just move the articles to a new directory with a structure/template you design and feel confident will work better without trying to unravel all the code and link pathways.

#3 bobbb


    Sonic Boom Member

  • Hall Of Fame
  • 3346 posts

Posted 03 May 2012 - 01:10 AM

Is the site using relative URLs in the internal navigation

They are now. One of the first things I did was to standardise this. Some were www some non-www, some http://. They are now all relative to the root /

No CMS. All pages are html, htm or shtml so everything is static.

No meta directives for G or any other.

The robots.txt disallows all but Googlebot, Mediapartners-Google, bingbot, msnbot, msnbot-media, Slurp, and Browsershots

I 301 some duplicate articles to just the one.

I am seriously thinking of renaming the directory and do a redirect in .htaccess but will be a bit more patient. 6 or 7 months is not long for a site.

#4 bobbb


    Sonic Boom Member

  • Hall Of Fame
  • 3346 posts

Posted 25 May 2012 - 01:02 PM

Well patience paid off. It was just a question of time.

They (G) came around yesterday and are starting to hit URL's like /index.htm/PageName.html

RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users