Jump to content

Cre8asiteforums

Web Site Design, Usability, SEO & Marketing Discussion and Support

Sign in to follow this  
bobbb

Weird Directory Structure

Recommended Posts

I recently started to maintain a website that has a weird directory structure. All the older articles are in a directory called index.htm (don't know why).

 

It puzzles me that Google does not index it. It has no problem with the images in directory index.htg. Is there some rule about directory names? A URL for this would be like: "www .DomainName. com/index.htm/PageName.html". Does that look spammy?

 

I've been checking this for about 3 months. Bing goes through it with no problem. G actually followed two in links and it did index those two pages so it can index that directory structure.

 

I have a sitemap.xml which it reads and all pages are listed. It's done about 50%. Maybe it's the number of pages? About 1200. Maybe it's just a time thing. Site up 7 months. Figured it may be because those URLs are all near the end of the file so I moved a few to the front. I even have one page linked to from the front page. It just keeps on visiting all the rest. Have no sitemap.html but all pages are reachable in 2 clicks from the front page.

 

I am getting traffic from a wide variety of search queries mainly from google.ca and google.com so they have no problem with the site.

Share this post


Link to post
Share on other sites

I have seen some sites that use "index.htm/" because of their Content Management System. It's dumb but whoever wrote the code either didn't know what they were doing or else did not care. The search engines definitely don't care so there is some other reason for why you're not seeing the individual articles show up.

 

Have you looked under the hood? Is the site using relative URLs in the internal navigation? Are they easily resolved? Are the articles being canonicalized to other URLs? Is the site using parameter filtering in Webmaster Tools? How does Robots.Txt handle this directory? What about on-page meta directives, either for "robots" or for "google" or "googlebot"?

 

There are a lot of reasons why a page might not show up in one search engine but could appear in another. You have to create a checklist and go down it with a fine-toothed comb.

 

Or, if you have the option and the time and resources, you could just move the articles to a new directory with a structure/template you design and feel confident will work better without trying to unravel all the code and link pathways.

Share this post


Link to post
Share on other sites
Is the site using relative URLs in the internal navigation
They are now. One of the first things I did was to standardise this. Some were www some non-www, some http://. They are now all relative to the root /

 

No CMS. All pages are html, htm or shtml so everything is static.

 

No meta directives for G or any other.

 

The robots.txt disallows all but Googlebot, Mediapartners-Google, bingbot, msnbot, msnbot-media, Slurp, and Browsershots

 

I 301 some duplicate articles to just the one.

 

I am seriously thinking of renaming the directory and do a redirect in .htaccess but will be a bit more patient. 6 or 7 months is not long for a site.

Share this post


Link to post
Share on other sites

Well patience paid off. It was just a question of time.

 

They (G) came around yesterday and are starting to hit URL's like /index.htm/PageName.html

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

×