Jump to content

Leading Community for Usability, Search Engine Marketing,
Social Networking, Site Planning & Web Site Development, Since 1998


Photo

Error Log Showing Errors Due To Old Pages (Cache?)


  • Please log in to reply
15 replies to this topic

#1 Pete

Pete

    Mach 1 Member

  • Members
  • 368 posts
  • Twitter:petethomasmusic

Posted 28 December 2012 - 08:05 AM

I like to check my error log from time to time to tidy up dead links etc.

However I get a lot of erros which are due to links on very old now obsolete pages.

Some of this may be due to fileshared PDF download of my site which happened years ago and I don't think I can do anything about that, but I'm wondering if sometimes this is caused by people getting cached pages in their browser?

If so is there a way to stop this, or to uncache those pages.

I doubt it's a big problem, but it would be easier for me to have it so my error log doesn't show so much "guff" and I don't waste time chasing up these dead links which aren't on the current site.

NB: if I change a page's, image's or directory's name I usually make a specific .htaccess redirect

Edited by Pete, 28 December 2012 - 08:05 AM.


#2 tam

tam

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 1825 posts

Posted 28 December 2012 - 08:48 AM

Check the referers, it maybe sites linking to old pages, which you can then contact and ask them to update. If they are browser bookmarks then there won't be a referer but you can set up redirects.

#3 Pete

Pete

    Mach 1 Member

  • Members
  • 368 posts
  • Twitter:petethomasmusic

Posted 28 December 2012 - 09:00 AM

Thanks, yes it didn't occur to me that some could be from bookmarks. Nice to think that people are bookmarking pages.

However many seem to be referrals from old versions of current pages, e.g. the referral will be to an internal link that is now changed.

e.g an image URL where the image directory now has a new name, but the referral is not from an external site. It's as if somebody is opening the page as it was two years ago.

#4 tam

tam

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 1825 posts

Posted 28 December 2012 - 12:27 PM

Can you track the path back, eg look at the referal for the old page the image link was in. It could be something like archive.org but you should see that it your logs.

Is it the same IP? I can imagine one person having an odd cache on their computer but not lots!

#5 Pete

Pete

    Mach 1 Member

  • Members
  • 368 posts
  • Twitter:petethomasmusic

Posted 28 December 2012 - 02:24 PM

I've just found one that really has me stumped.

File does not exist: /data03/cxxxxxx/public_html/taming/pic-sax/xs-06b-baritone.jpg, referer: http://petethomas.co...nstruments.html



That image (xs-06b-baritone.jpg) used to be on that page but is not now, it is deleted as is the link to it.

But if I load the current page (with no link to the obsolete image) I still get the error. I double check by looking at the live page sources and searching, the link is not there, yet I'm getting the error each time I refresh the browser.

Edited by Pete, 28 December 2012 - 03:58 PM.


#6 bobbb

bobbb

    Time Traveler Member

  • 1000 Post Club
  • 1426 posts

Posted 28 December 2012 - 02:42 PM

Another idea is relative links: relative to the root of the site or relative to the existing page.

e.g. On the page: /OtherDirectory/ThisPage.html -->> <img src="ThisPhoto.jpg" where the photo was in /OtherDirectory/

When: /OtherDirectory/ThisPage.html moved / to /ThisDirectory/ThisPage.html should now be -->> <img src="/OtherDirectory/ThisPhoto.jpg"

If it had been <img src="/OtherDirectory/ThisPhoto.jpg" in the first place the move would cause no problem.

I use Xenu to check the links on a site. It works well. Do not let the name Xenu put you off, There is no danger.

I keep a copy of all my site on an internal Apache server and check it there so as to not impact real stats.

#7 tam

tam

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 1825 posts

Posted 28 December 2012 - 02:45 PM

Found it! You're preloading images in your js file: http://petethomas.co...cript/image4.js line 12

#8 bobbb

bobbb

    Time Traveler Member

  • 1000 Post Club
  • 1426 posts

Posted 28 December 2012 - 02:50 PM

I was just about to post that. Xenu would not have caught that. Does not do .js

Edited by bobbb, 28 December 2012 - 02:51 PM.


#9 Pete

Pete

    Mach 1 Member

  • Members
  • 368 posts
  • Twitter:petethomasmusic

Posted 28 December 2012 - 03:04 PM

Brilliant, thanks Tam. That explains that one anyway.

bobbb, I use jeditx to check links and it didn't catch that one either.

#10 Pete

Pete

    Mach 1 Member

  • Members
  • 368 posts
  • Twitter:petethomasmusic

Posted 28 December 2012 - 04:00 PM

But this one is even weirder:

[Fri Dec 28 20:03:45 2012] [error] [client 82.94.176.145] File does not exist: /data03/cxxxxxx/public_html/petethomas/recording-saxophones.html, referer: http://petethomas.co.uk/
[Fri Dec 28 20:02:28 2012] [error] [client 82.94.176.145] File does not exist: /data03/cxxxxxx/public_html/petethomas/recording-saxophones.html, referer: http://petethomas.co.uk/x-sitemap.html


I can load the first referring page and there are no errors, yet there it is in the log

The second referring page (x-sitemap.html) has not existed for about two years so maybe that one is a bookmark somewhere, it's the first one I don't understand

#11 tam

tam

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 1825 posts

Posted 28 December 2012 - 04:29 PM

hmm, I can't find that one either.

What I'd do is, make a copy of the index page called index2.html, load it and check you get the error, presuming you do, then start deleting elements and testing as you go until you stop getting the error. Eg. delete all the contents in the head, reload and check for error, if it's still there delete the sidebar and recheck and so on. When the error stops then undo the last delete and narrow it down by deleting smaller chunks eg the header links one at a time or the sidebar sections. That should help you track it down.

#12 Pete

Pete

    Mach 1 Member

  • Members
  • 368 posts
  • Twitter:petethomasmusic

Posted 28 December 2012 - 05:35 PM

The problem is I don't get the error when I view that page, I saw it in the error log, then loaded the page myself and it's fine.

#13 tam

tam

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 1825 posts

Posted 28 December 2012 - 06:02 PM

Ahh, sorry, I thought that was you duplicating it.

What else is that IP looking at? Can you see where they are entering the site via?

#14 bobbb

bobbb

    Time Traveler Member

  • 1000 Post Club
  • 1426 posts

Posted 29 December 2012 - 02:24 AM

As previously stated: "What else is that IP looking at" that day and previously?
IP 82.94.176.145 is registered to Wise Guys Internet BV and I see them running a spider at 82.94.179.43

Since the first URL does not have that link in it and the second does not exist then one must presume the referer is spoofed.
That 404 URL does exist on the Wayback Machine as does x-sitemap.html (2010) but clicking the link gets you to an archived version of recording-saxophones.html so I doubt it comes from them.

If this is an isolated occurance then maybe it is nothing. They may be following and old link from some old page in their data. I still see Bing, Google, and Yandex banging their heads against the wall for a URL that has not existed since at least 2005.

From wise-guys site:
We provide spider and search technology. We spider the web 24/7, 365 days a year, resulting in one of the largest up-to-date web archives of the internet.

Anything in your logs from 216.252.n.n is me.

Edited by bobbb, 29 December 2012 - 02:26 AM.


#15 tam

tam

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 1825 posts

Posted 29 December 2012 - 01:46 PM

I think that's the most likely scenario if they are a spider they've got an old list of URLs they are using.

Just one other thought, you haven't uploaded a site map directly via google/yahoo/bing tools that has old links, or have an rss or xml sitemap that you've forgotten to update? I don't think that would show up in the logs like that though.

If it's just them then I think I'd write it off and just continue to monitor as you already are.

Btw, nice looking website :)

#16 Pete

Pete

    Mach 1 Member

  • Members
  • 368 posts
  • Twitter:petethomasmusic

Posted 29 December 2012 - 05:07 PM

I think that's the most likely scenario if they are a spider they've got an old list of URLs they are using.

Just one other thought, you haven't uploaded a site map directly via google/yahoo/bing tools that has old links, or have an rss or xml sitemap that you've forgotten to update? I don't think that would show up in the logs like that though.


I just discovered a Greek site which has copied and pasted the html from an old version of one of the sites, almost the entire navigation with images links and css, so that explains some of them. I suppose I can't complain, there's about 60 links back to my site!

I doubt I will ever get all or many of the errors sorted out, the thing is the error log is for all the sites I have hosted there. Over the years pages have been moved around and redirected, directories reorganised and renamed etc.

Plus the sites include a store and a vbulletin forum. The forum always seems to throw up lots of errors and I think it would be a nightmare to work out what they all are.

e.g.

client 193.105.210.113] File does not exist: /data03/cxxxxxx/public_html/forum/index.php\t0, referer: http://cafesaxophone.com/index.php%090


I will just keep chipping away at the ones I can follow, and maybe it will gradually get clearer.

Btw, nice looking website :)


Thank you very much, and thanks again for spotting the redundant .js directory, I will look out for more of those.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users