However, there are a few things which might influence web search as well:
- Google searches inside of "zip" files (and tar.gz, jar, etc).
That means that if you have content which should not be indexed, it is not safe to just place it within zip-files. On the other hand, it will index content from your zip files.I wonder how it handles password protected zip files? or broken zip files (like those used to attack antivirus solutions that unzip files to check them)?
- Google extracts information from your code, like which license is uses, which language it's written in
Hmmm, where does it get that information from? Probably pattern matching for known license texts.
How do you rank for code-search? Since your code is usually only linked from very few places within your own site (and hardly ever from the outside directly) I expect the influence of your own sites general value ("PR" if you will) is a strong factor. Within the code it's hard to determine important sections (no headers, no bold, etc.) but perhaps they take the frequency? How do they determine if a piece of code is relevant for your search term or not?
How do you make sure that your "current" code is indexed and perhaps the older versions are removed? How do you keep Google from indexing your "bad examples"?
Fun stuff. Finally something for the geeks among us . Hey - look, someone used my code snippets with my original comments in them ! No more easter eggs ..