![]() ![]() |
Founder & Administrator![]() Group: Admin - Top Level
Joined: 29-August 02
Posts: 11,644
From: Bucks County, PA
|
Oct 13 2007, 09:34 PM |
|
|
We all use robots.txt on our servers to block search engines and bots from accessing files and folders.
Did you know that anyone can run a search to see what you're blocking? Did you know that just because you block search engines from crawling files and folders, doesn't mean they are always blocked from people? Type into the Google search field: QUOTE "robots.txt" "disallow:" filetype:txt You'll see the robots.txt files for Google itself, government sites, Webmasterworld, Craigslist, Microsoft, and much more. Interestingly, it showed 72,900 results, which seems low. The kicker is that if you take a disallowed file and add it to their url, you can sometimes get into the file they think is being blocked. (In some cases, not all.) Ex: http://www.craigslist.org/robots.txt Remove the "robots.txt" in the url and enter one of the disallowed files. Like "Disallow: /sss" It would look like QUOTE Google will take you to the page. Many sites don't show anything, which is good for them. It's interesting, however, to look and see what they block. Some government sites are working hard to block email harvesters and known invasive bots. Some companies specify in detail the search engine bots they don't want coming around. I went to Google and searched on one of my directory folders that I don't want crawled and indexed. I discovered that I can go right in there, see a list of the files in that folder, click on them and get them. I can also see the robots.txt file in the folder. How do you protect files and folders from being accessed from their browser when they have the exact URL? I found this interesting because there's a misconception that the robots.txt file is "hiding" things you don't want found on your server. I thought this was a good reminder of what exactly this means and doesn't mean. If you don't want nudie pictures of you being found, hide them in the attic. |
||
| Offline | ![]() |
Hall of Famer![]() ![]() Group: Hall Of Fame
Joined: 3-November 05
Posts: 3,461
From: CHeeseland
|
Oct 14 2007, 07:08 AM |
|
|
Anything you put online, without proper password-protection, is likely to be found. If you're unlucky, it'll even end up in a search engine. The robots.txt is never a replacement for password-protection. Like Pierre said, if you don't want nude photos of yourself showing up online, don't take any
Imagine the following situation (not made up, I've seen it happen all too often): - your server has log files / statistics that are publicly visible (because your hoster doesn't lock it down.........) - you access your private files - those accesses are logged and shown in your statistics - a search engine happens to find the statistics (there are lots of them in the search engine results, thanks to those hosters.......) - your robots.txt accidentally gets removed (oops, who would notice...) It happens every day. Generally, the search engines will revert back to a last-known version of your robots.txt, but if you keep it removed for long enough, they'll assume that they can crawl everything. With password-protection in place, your statistcs would never be found (those hosters could of course do it automatically, but that would make things hard and cause lots of complaints.....) and if your files were password protected then even with public statistics those files would never be found. Apache servers make it really easy to add password protection - see http://www.ilovejackdaniels.com/apache/pas...-with-htaccess/ John |
||
| Offline | ![]() |
Member![]() Group: Members
Joined: 20-June 06
Posts: 46
From: Columbus, OH
|
Oct 17 2007, 03:09 PM |
|
|
Someone once told me: security through obscurity is just an illusion.
If you want to secure something, block it properly, rather than just hide it a bit. |
||
| Offline | ![]() |
|
|
| Lo-Fi Version | Time is now: 9th February 2010 - 04:52 PM |
| Meet our Moderators: | cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB |