Reply to this topicStart new topic
> It's Google Patent Time, It's Google Patent Time..., Take 2-ASA And Suffer Along With Me

Membership Admin & Moderator

Group Icon
Group: Membership Admin & Moderator
Joined: 6-January 07
Posts: 2,189
post Feb 17 2008, 09:05 PM
That SEO-by-the-sea set me thinking (again) and it gave me a headache (again).
QUOTE

Computer programmers will sometimes use the term “boilerplate” code to refer to standard stock code that they often insert into programs. Lawyers use legal boilerplate in contracts - often the small print on the back of a contract that doesn’t change regardless of what a contract is about.

It might be a good step for a search engine to ignore boilerplate text when it indexes pages, or uses the content of pages to create query suggestions for someone using a desktop personalized search. Ignoring boilerplate in the same documents could be helpful when using those documents to rerank search results in personalized search.

Google Omits Needless Words (On Your Pages?)

As usual in software patents the language is so inclusive it is difficult to pick out the actual tiny reality but there certainly are some intreguing potentialities.
1. synthesis of desktop, network, and web search plus personalisation.

2. identification and classification of repeat non-content across pages (boilerplate):
* certain words especially if attached to links, i.e. home, about us.
* certain spacial areas, especially if including links, i.e. blogroll, nav links, but even if few/no links, i.e. header, footer.
* certain markup, i.e. javascript, but possibly also CSS id/class names such as header, footer, nav.

Immediately I saw that links inside of 'content' (however narrowly defined) may gain even greater value OR possibly those outside will lose.

The more I read that patent the broader it seems. That it lists a great many associated patents which have yet to be published appears to lead into entire new realms. Among them are:
* Extracting a Keyword from an Event
* Refreshing a Content Display
* Constructing and Using a User Profile
* Identifying a Named Entity
* Associating a Keyword with a User Interface Area
* Generating a User Interface

And I am still struggling with the terms 'implicit' and 'explicit' as used and referenced.
Thanks Bill. ph34r.gif

Note: His post has been Sphann, Sphenn, Sphinn, Sphonn, Sphunn.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Feb 17 2008, 09:41 PM
Thanks, iamlost smile.gif

Nice synopsis of the patent application.

The key to understanding the language about implicit and explicit searches is recognizing that this patent application fits within a framework of a search engine that combines aspects of web search, desktop search, and personalized search.

Implicit Searches

If you are writing a document, surfing the web, running an application or two, or perfoming some other tasks that keywords can be associated with, the search engine might follow along, and see what you are interested in. It might take some keywords from the paper that you are writing on Dutch painters of the 1900s and perform a search in a desktop search query area, where you can see results from your own computer, the intranet you are working on (maybe the University Library), and the Web.

Since you didn't actually perform those searches yourself, they are considered "implicit searches" because the topics or keywords used were implied from your actions or research.

The boilerplate aspect of this is that boilerplate from word documents, web pages, and other textual resouces might be ignored in collecting those keyword phrases.

Explicit Searches

The keywords collected from those textual resources might also be influential in reranking the search results that you see when you actually do perform a search. Because you are typing the query into a search box, this kind of search is referred to as an explicit search.

Again, boilerplate may be ignored.

Today's Searches

It's not difficult to image that Google could be trying to understand and ignore boilerplace when indexing pages. It may still index boilerplate, but it's possible that it may not rank it as highly as text upon pages that it considers "actual content."

Boilerplate may also possibly be ignored when a search engine does duplicate content detection. I've seen at least one referrence to that happening on a Google patent application (which I haven't blogged about yet.)
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 29-December 05
Posts: 3,291
From: Novosibirsk, Russia
post Feb 17 2008, 10:33 PM
So, does this mean that links from:
- top navigation
- footer
- blogrolls
- page blocks

will be ignored? It won't make sense in most cases, though I am sure some of them may be used to spam, too.

This post has been edited by A.N.Onym: Feb 17 2008, 10:57 PM
Offline Go to the top of the page

Membership Admin & Moderator

Group Icon
Group: Membership Admin & Moderator
Joined: 6-January 07
Posts: 2,189
post Feb 17 2008, 10:51 PM
Thank you for confirming what I had about decided im-/ex -plicit (yes, they look like that in my mind wink-2.gif) probably mean. And in much simpler clearer words.
Three cheers for Bill: Huzzah! Huzzah! Huzzah! cheers.gif
QUOTE(A.N.Onym)

So, does this mean that links from:
- top navigation
- footer
- blogrolls
- site blocks

will be ignored?

As with everything algo and patent related - maybe. They certainly describe the potential ability.

And soon to come HTML5 with new named markup - article, section, header, footer, nav... html markup that specifies content might just be what the SEs ordered.
Offline Go to the top of the page

Untested

Group: Members
Joined: 15-November 06
Posts: 3
post Feb 19 2008, 10:59 AM
I think you have to see this as a proposed theory by Bill. He's stating that this may or may not be happening.

For me, it's definitely not happening. I recently started optimizing a client's website that was getting no hits at all in a very competitive industry. I created a title tag for the home page and some text in a header with a graphic, and then added psychological "selling" content to the main page. The webmaster of the "php" website that had CSS and include scripts on every page put my homepage title tag and header content on ALL the pages much to my chagrin.

When I looked at a weeks worth of data I was surprised to see Google putting together long-tail phrases made up from my repetitive title tags, header content, and the unique page content(that existed from before) on all the pages.

I think it goes back to what M*tt C*tts said a long time ago. If we penalize all the websites out there that don't have W3C compliant code, we'd lose 40% of the Internet. Translation: Google knows about crappy coding, duplicate content, and boilerplate phrases and they "TRY" to adjust to it, but don't let it affect their results. They may in fact put out what they prefer to see on websites in the form of doctrine, but if they act on that doctrine they lose valuable results searchers need.

How many websites out there do you think have duplicate title tags on all pages?? Do you think Google will ignore all these pages and all the boilerplate content if it matches perfectly with the long-tail search phrase the searcher uses? I just don't see this happening. It may be happening on blogs because they would be easy to filter...but it isn't happening on websites.
Offline Go to the top of the page

Moderator/Blog Editor

Group Icon
Group: Site Admin
Joined: 18-January 05
Posts: 5,375
From: Olympia WA, USA
post Feb 19 2008, 11:25 AM
QUOTE(Yura)
So, does this mean that links from:
- top navigation
- footer
- blogrolls
- page blocks

will be ignored? It won't make sense in most cases, though I am sure some of them may be used to spam, too.

Where they not unique terms, like contact, recent posts, and FAQ, having them off the radar for keyword linktext might help designer-seos be less distracted by the possibility of leaking PR superpowers via essential navigation elements. For instance, I recently saw a "SEO tip" that recommended taking hx off of sidebar navigation structures, on the theory that sidebar hx could be competing with the hx in an article. One accessibility problem with removing navigation hx is that it takes hx landmarks out of a helpful hierarchy, for what may be doubtful "SEO" benefit.

As long as a spider still follows a link to the content, and indexes it based on the merits of the content, what's the harm in reducing the benefit of sitewides? Aren't sitewides already not as powerful?
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Feb 19 2008, 01:40 PM
QUOTE
I think you have to see this as a proposed theory by Bill. He's stating that this may or may not be happening.


Hi SEO Bru

There really wasn't any proposed theory, or theorizing on my part.

The patent application really does describe in detail a search engine that may or may not be developed. I haven't said or claimed that what is described in the patent application is something that is happening now, because the search engine described in that patent application either doesn't exist, or is not available to the public at this time.

The process in the patent application describes the creation of implicit and explicit searches in that search engine that may come out in the future, using content on the pages that someone has looked at recently to gather keywords for those searches in suggesting searches to a user, or reranking search results. It's there where boilerplate may be ignored, or given less weight.

Considering the amount of effort that is going into the development of that search, with at least 50 patent applications and some really bright people at Google working upon it, perhaps it will be something that we see.

What I have said about now is that we should consider and think about some of the assumptions that we make, and some of the ones that the search engineers exhibit in places like this patent application.

I didn't say that Google will no longer index headers and footers and sidebar navigation.

I don't expect them to, but with this patent application taken together with what Google has written about visual segmentation of gaps (understanding the differences between different parts of pages such as headers, footers, sidebars, main content areas, etc.), I would start questioning how long putting title attributes in places like images in headers can remain competitive against people who use the main content areas of pages to optimize for the same terms.

One of the values of looking at patent applications is that they can help us be proactive rather than reactive. What may work today, may not work tomorrow. It's not necessary to make changes if you don't want, and I'm not suggesting that you do. But, recognize the potential for change, and an awareness of something that may come along in a few months or a few years may be something that you find helpful.


QUOTE
For instance, I recently saw a "SEO tip" that recommended taking hx off of sidebar navigation structures, on the theory that sidebar hx could be competing with the hx in an article.


I don't like using header elements in navigation either. I want the focus to be upon the unique page content, and not something that appears within a global navigation scheme.
Offline Go to the top of the page

Moderator/Blog Editor

Group Icon
Group: Site Admin
Joined: 18-January 05
Posts: 5,375
From: Olympia WA, USA
post Feb 19 2008, 07:47 PM
A sidebar on sidebars --
QUOTE(Bill)
QUOTE(Elizabeth)
For instance, I recently saw a "SEO tip" that recommended taking hx off of sidebar navigation structures, on the theory that sidebar hx could be competing with the hx in an article.


I don't like using header elements in navigation either. I want the focus to be upon the unique page content, and not something that appears within a global navigation scheme.
Using a hx for the heading of "Recent Posts" and other nav groups lets people who navigate by keystrokes tab through the headings. It's a screen reader thing - you can tab from heading to heading.

Once in the "boilerplate" sidebar nav area, there's no reason AFAIK that it has to be a h2, for instance. H2 is what the WP default theme uses for headings that label a nav group in the sidebar. Skiplinks gets a person to the sidebar nav location, and Hx hierarchy gets them through the list of lists that makes up sidebar nav. I don't *think* it'd matter if it was a h5 or h6, as long as sub areas had a lower importance hx than the top level within sidebar nav.

This post has been edited by AbleReach: Feb 19 2008, 07:49 PM
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 29-December 05
Posts: 3,291
From: Novosibirsk, Russia
post Feb 19 2008, 10:19 PM
Well, technically, hx tags are to be used to identify page sections/topics and as such, using them to mark page sections, such as sidebar blocks seem to be appropriate.

However, the problems arise, when we use a heavy hx structure in the main content. When we mark both parts of the main content and parts of the sidebar sections with the same tags, the screen reader users can unexpectedly jump from the article to the navigation and visa versa.

If we are to use hx in navigation, how can we prioritize the site blocks? As we can't safely predict, whether the last used hx tag in the main content will be h3 or h5, using h3-h5 tags in sidebar block navigation seems to be inappropriate, both according to the spec (don't skip hx levels) and for the benefit of the screen reader users (shouldn't confuse navigation).

That being said, I wasn't necessarily against the use of hx tags for sidebar nav and have been thinking it's a good idea. Upon further thinking thanks to this thread, this doesn't seem to be a very bright idea.

This post has been edited by A.N.Onym: Feb 19 2008, 10:19 PM
Offline Go to the top of the page
Fast ReplyReply to this topic Start new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 04:29 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed