I look at validation in a couple of ways.
The first is to help me catch any stupid errors that I may have made. It's like having a free proofreader.
The second is to help mitigate risk, in case something is really wrong, and it might cause problems with something getting indexed.
And you know, spidering a site "pretty decently" is fine and good, but I don't think that there's anything wrong with wanting to try to do the best you can on a site. There's nothing wrong with aiming towards accessible pages, or pages that look good in different browsers.
I spent a couple of months in 2005 training someone in SEO who had an incredible amount of education and experience in standards based html coding and had been the lead accessibility person on some major federal US government web sites.
What made that a lot of fun was explaining where semantically meaningful uses of standards could make it easier for search engines to understand the contents of pages, and their connections to other pages on the same site.
I recently researched and wrote about how search engines are looking at segmenting pages for handhelds, and it was pretty obvious that having better control over your code and the way it is presented would mean that it would be better presented on a smaller screen through some proxy service though a Google or Nokia, or other potential point of access.
That breaking down of pages, and segmenting them may have implications for SEO in a number of ways, because they may provide opportunities for the search engines to understand pages better. For example, it might be preferable to pull snippets from the main content of a page instead of a footer or sidebar, if there is no meta description or it doesn't contain the keywords used in the query that the page showed up in search results for. The value of links could be calculated differently if they were in headings, or footers, or sidebars, or main content areas. Duplicate content filtering might be more meaningful if it could focus upon the main content section of a page. And keyword or phrase indexing might also be more meaningful if those words were from an content area, then say, a footer.
Sure, even text files will get indexed. But, there's a chance that the text in an <h2> in a sidebar isn't as important to a page as the text in an <h2> in the main content of a page. Is this something a search engine is presently looking at? I don't know.
Here's a blog post I wrote a while back about how Google might handle navigation when trying to take large pages and put them on small screens:Google Indentifies Navigation Bars for Small Screens, Snippets, and Indexing
Here's a snippet:
The primary focus of this patent is on identifying navigation bars on a page that can safely be re-written or changed in some manner for display on a smaller screen. An integral part of the process involves actually identifying navigation bars. Itís probably important that the patent mentions (briefly) that this identification can be helpful in indexing a page and deciding upon which text to use to provide snippets to searchers, which goes beyond the reauthoring process.
Considering the ways in which search engines may want to manipulate the content of a site, and possibly even rewrite parts of it, I want as much control over the code as possible.
So yes, search engine spiders are forgiving of bad code. But, how much control do you want to turn over to them in their ranking and presentation of your site to others?