Weird Directory Structure
Posted 02 May 2012 - 01:59 AM
It puzzles me that Google does not index it. It has no problem with the images in directory index.htg. Is there some rule about directory names? A URL for this would be like: "www .DomainName. com/index.htm/PageName.html". Does that look spammy?
I've been checking this for about 3 months. Bing goes through it with no problem. G actually followed two in links and it did index those two pages so it can index that directory structure.
I have a sitemap.xml which it reads and all pages are listed. It's done about 50%. Maybe it's the number of pages? About 1200. Maybe it's just a time thing. Site up 7 months. Figured it may be because those URLs are all near the end of the file so I moved a few to the front. I even have one page linked to from the front page. It just keeps on visiting all the rest. Have no sitemap.html but all pages are reachable in 2 clicks from the front page.
I am getting traffic from a wide variety of search queries mainly from google.ca and google.com so they have no problem with the site.
Posted 02 May 2012 - 11:22 PM
Have you looked under the hood? Is the site using relative URLs in the internal navigation? Are they easily resolved? Are the articles being canonicalized to other URLs? Is the site using parameter filtering in Webmaster Tools? How does Robots.Txt handle this directory? What about on-page meta directives, either for "robots" or for "google" or "googlebot"?
There are a lot of reasons why a page might not show up in one search engine but could appear in another. You have to create a checklist and go down it with a fine-toothed comb.
Or, if you have the option and the time and resources, you could just move the articles to a new directory with a structure/template you design and feel confident will work better without trying to unravel all the code and link pathways.
Posted 03 May 2012 - 01:40 AM
They are now. One of the first things I did was to standardise this. Some were www some non-www, some http://. They are now all relative to the root /
Is the site using relative URLs in the internal navigation
No CMS. All pages are html, htm or shtml so everything is static.
No meta directives for G or any other.
The robots.txt disallows all but Googlebot, Mediapartners-Google, bingbot, msnbot, msnbot-media, Slurp, and Browsershots
I 301 some duplicate articles to just the one.
I am seriously thinking of renaming the directory and do a redirect in .htaccess but will be a bit more patient. 6 or 7 months is not long for a site.
Posted 25 May 2012 - 01:32 PM
They (G) came around yesterday and are starting to hit URL's like /index.htm/PageName.html
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users