Jump to content

Leading Community for Usability, Search Engine Marketing,
Social Networking, Site Planning & Web Site Development, Since 1998


Photo

Do Search Engines Use One Algorithm For All Results?


18 replies to this topic

Poll: Do Search Engines Use One Algorithm For All Results? (11 member(s) have cast votes)

For all Searches, do SEs use just teh one algorithm?

  1. Yes. (1 votes [9.09%])

    Percentage of vote: 9.09%

  2. No. (8 votes [72.73%])

    Percentage of vote: 72.73%

  3. Unsure. (1 votes [9.09%])

    Percentage of vote: 9.09%

  4. No way to tell. (1 votes [9.09%])

    Percentage of vote: 9.09%

Vote Guests cannot vote

#1 projectphp

projectphp

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3934 posts
  • Twitter:motherwell
  • Facebook:http://www.facebook.com/mmotherwell

Posted 19 May 2006 - 12:23 AM

I see a lot of people say absolute things, and I wonder, is there one algorithm in play for all searches, or is it possible that SEs use different algorithms for different searches?

#2 A.N.Onym

A.N.Onym

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 4001 posts
  • Twitter:http://twitter.com/yuraf
  • Facebook:http://www.facebook.com/yura.filimonov

Posted 19 May 2006 - 12:50 AM

When it comes to basic standards, they are all aimed to show up pages, containing the queries, or having them in the linking text. When it comes to more complex stuff, like the importance of any of the ranking factors and their amount, everything's different.

I bet you knew this before, though ;)

#3 projectphp

projectphp

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3934 posts
  • Twitter:motherwell
  • Facebook:http://www.facebook.com/mmotherwell

Posted 19 May 2006 - 01:02 AM

I wonder if one algorithm makes the most sense.

A search for do blind people dream doesn't need the same ammount of processing to get a "good" result as would, say, mortgage. So using one algorithm wouldn't neccesarily be the best way forward in my view.

I also wonder if the 10 results all use the same algorithm, e.g. if there is more than one ranking algorithm in play, and all 10 results are from the same algo, or there are more than a few.

I really have no proof or real evidence, but it seems to me the best way to produce a varied SERP, which IMHO is partly the goal, is to have the ten results utilise different algorithms. You could then have algorithms that favoured different things, content, off page, specific areas etc etc.

I just wonder what other people think ;)

#4 A.N.Onym

A.N.Onym

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 4001 posts
  • Twitter:http://twitter.com/yuraf
  • Facebook:http://www.facebook.com/yura.filimonov

Posted 19 May 2006 - 01:08 AM

Aha, it is much clearer now.

I don't think it'd make sense to process some queries with less vigor than others.
If we can provide top quality results for 'mortgage' (at least as much as the search engines allow), why not provide the same quality when searching for less popular queries?

That being said, I've read somewhere that the Big Daddy update changed the SE algorithm structure, allowing connecting different modules, used specifically for different purposes.

#5 joedolson

joedolson

    Eyes Like Hawk Moderator

  • Technical Administrators
  • 2869 posts
  • Twitter:http://twitter.com/joedolson
  • Facebook:http://facebook.com/joedolson

Posted 19 May 2006 - 02:27 AM

I wrote a blog post not too long ago where I touched on this idea (just a little bit, at the very end). My thinking was that search engines could be building on the concepts applied by metasearch engines to blend the results from radically different search algorithms and achieve better relevancy. Whether they actually do this? No idea.

Another thought occurs to me now - are the "vertical creep" results which Google brings in from their images index, etc., perhaps sourced using an alternate algorithm? I've usually thought about them as coming from a different index - but it seems quite reasonable to suppose they're applying a different algorithm as well.

A.N.Onym - I hadn't heard that suggestion about BigDaddy - any chance you can point to an article somewhere?

#6 A.N.Onym

A.N.Onym

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 4001 posts
  • Twitter:http://twitter.com/yuraf
  • Facebook:http://www.facebook.com/yura.filimonov

Posted 19 May 2006 - 02:56 AM

Unfortunately, I don't remember where I've read something about it.
I'll post the link as soon as I find it.

#7 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 19 May 2006 - 04:19 AM

I read "somewhere" that Google uses over 200 "signals" for ranking. Additionally, they seem to have some sort of manual penalty system - probably site, url and keyword based. (The penalty system seems to have been partially automated with BigDaddy, something I find very disturbing)

With over 200 signals to play with, they don't really need to use different "algorithms" - instead they just play with the weighting factors. ;)

I really have doubts that Google would use different algorithms for different searches. It is not like Google to do so. However, I'm sure that they have signals which react to the general "spammy-ness" of the results and the popularity of keywords.

Additionally, I doubt that Google will use a simple linear weighted sum to calculate the URL "weight" in the results, instead I think they use signals and combinations of signals as factors (or inverse factors) for weighting other signals.

An example of what I mean: Let C = "competition" (higher=more), P = sum of onpage-factors
Instead of doing:
weight = .... + wC * C + wP * P
they could be doing:
weight = .... + wP * P / (wC * C)
Which would decrease the weight of the on-page factors for terms which are very competitive. Add to that any number of other mathematical operations, and you'll soon have so many possible variations that they'd need a Google-Datacenter just to store them ;)

Google is a geek-company - they would not go towards a manual selection (this algorithm for that, the other for other searches) - they would always try to have one (probably very complicated) formula for their calculations. Figuring out which weights and which combinations to use is something that they probably do based on a known corpus of sites / keywords and once they're satisfied with internal results they put them live on a datacenter - to judge the user-reports that they get.

Even if you had access to the formula, all the weighting factors and knew how they determined the signals (eg live "pagerank" based on pre-discounted links minus trust of those links), it would be one heck of a job to optimize a site to stay on top. I'd venture to guess that it would be easier NOT knowing the exact formula. Imagine 200 signals that you have to tune....

John

#8 joedolson

joedolson

    Eyes Like Hawk Moderator

  • Technical Administrators
  • 2869 posts
  • Twitter:http://twitter.com/joedolson
  • Facebook:http://facebook.com/joedolson

Posted 19 May 2006 - 04:46 AM

I read "somewhere" that Google uses over 200 "signals" for ranking. Additionally, they seem to have some sort of manual penalty system - probably site, url and keyword based. (The penalty system seems to have been partially automated with BigDaddy, something I find very disturbing)


Matt Cutts has definitely stated on a number of occasions that Google uses over 100 (I think I've seen over 200 somewhere as well), but most recently states:

Is PageRank such a big deal? Can you value a link on a page by its PageRank? Matt says:

"There are over 100 factors in ranking. And PageRank is just one of them. It's an important factor, but it's by no means the be-all and end-all.


However, I'd also point out that this discussion isn't JUST about Google's ranking strategy - I think we're trying to look more generally at search engine ranking methodology.

I really have doubts that Google would use different algorithms for different searches. It is not like Google to do so.


I'm curious why you say this - I don't personally see any way in which Google has a particular tendency not to use alternate algorithms. This doesn't necessarily mean that they use search algorithm "A" for one term and algorithm "B" for another - it could mean that they apply different theories to retrieve documents when you're searching Blogs versus Images.

There's also some question whether an algorithm has itself changed if the factors weighted within it have changed. I think (and I'm not terribly mathematically inclined, so correct me, please) that an algorithm is fairly defined entirely in absense of any conditions:

Algorithm:
A set of ordered steps for solving a problem, such as a mathematical formula or the instructions in a program. The terms algorithm and logic are synonymous. Both refer to a sequence of steps to solve a problem. However, an algorithm implies an expression that solves a complex problem rather than the overall input-process-output logic of typical business programs.


That is, the algorithm remains unchanged regardless of the value of the variables used to weight the conditions. However, what if the weighting of variables is itself achieved using a sub-algorithm? Such that the weighting of a condition is in fact adjusted algorithmically rather than absolutely - perhaps, in this case, it could be argued that different searches actually are using different algorithms, at least in part.

Metasearch engines operate by leveraging the results of other engines (and therefore, their algorithms) and applying their own algorithmic operations to compile the results - these engines, therefore, are explicitly making use of multiple algorithms in each search. If such an engine selects a different set of engines to query depending on your search term, it is therefore using a different algorithm - but also a different index.

Complex subject...

#9 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 19 May 2006 - 05:58 AM

I really have doubts that Google would use different algorithms for different searches. It is not like Google to do so.


I'm curious why you say this


It's just a feeling ;) - based mainly on the things that Matt Cutts hands out on his blog (yes, I know: biased and perhaps misleading on purpose). When you look at topics like 301/302 redirection and domain canonicalization - these types of things could be easy to clean out if they had "exceptions" that were handled separately. But instead, Matt often claims that "the algorithm" will take care of it. When you cross-check afterwards, you see that "the algorithm" has been improved in that regard, but it still contains lots of items which still show the same traits. Again: these items could simply be cleaned up without much fuss if they had exceptions, alternate algorithms, etc. By putting it all into "the" algorithm, they make things like this hard to clean up without causing influence to other related factors.

It's hard to put to words (=gut feeling, I have no proof), but so many of the things I've seen from Google which handle large amounts of data all seem to be based on a single algorithm. When you start thinking in terms of the total number of queries, pages, sites it just doesn't make sense to manually switch algorithms depending on some "human factors". If it is done automatically, then IMHO that is still the same algorithm.

A meta-search engine is in my opinion something different: they just put factors on different result-sets, they can't compare and rank the returned results (because they don't have more information on them). Even then it could be argued that they use a single algorithm: "f(x) = fGoogle(x) * wGoogle + fYahoo(x) * wYahoo + .... " - no matter that Google's algorithm might change, from the outside of the black box it is still a single result set based on a query (+ other factors such as time, geolocation, etc etc).

One thing I've always wanted to do was re-create a search engine like Google. Imagine if you can take those 200 (or 100) signals and simulate them by using only 10 - if you can get the top 10-100 results exactly the same with a simulation. It would be so much fun to play with a simulation like that, but in reality it would take a lot of work to even the basic data to recreate it (even if for the simulation you limit yourself to an extremly small subset of sites). <sigh> too many ideas, too little time ;)

John

#10 joedolson

joedolson

    Eyes Like Hawk Moderator

  • Technical Administrators
  • 2869 posts
  • Twitter:http://twitter.com/joedolson
  • Facebook:http://facebook.com/joedolson

Posted 19 May 2006 - 06:12 AM

That's what so many of these conversations come down to - gut feelings. Darned trade secrets! ;) I definitely see where you're coming from - I hadn't paid that close of attention to factors such as Matt Cutts' word choice, etc. - but it does point at a certain degree of evidence for a single algorithm.

When you start thinking in terms of the total number of queries, pages, sites it just doesn't make sense to manually switch algorithms depending on some "human factors". If it is done automatically, then IMHO that is still the same algorithm.


Yep. See your point...

A meta-search engine is in my opinion something different: they just put factors on different result-sets, they can't compare and rank the returned results (because they don't have more information on them)


Now what if a meta-search engine started maintaining it's own index, in addition to compiling external results? I'm getting off-topic and a bit fantastical now, I think - but do you know of anything out there which might be doing this?

I don't see a lot of value to such an activity - one of the advantages to meta-search is the significantly smaller infrastructure since you don't need to maintain your own indices, engineer a crawler, etc. - not sure what the advantages could be in combining meta-search with "normal" search, except for uniqueness...(maybe).

<sigh> too many ideas, too little time


<sigh>.

#11 FP_Guy

FP_Guy

    Mach 1 Member

  • Members
  • 410 posts
  • Twitter:websthatrock
  • Facebook:http://www.facebook.com/internetpresence

Posted 19 May 2006 - 07:43 AM

I just blogged on this, this morning. In a recent Matt Cutt's interview he relates that different algorithms are used.

#12 joedolson

joedolson

    Eyes Like Hawk Moderator

  • Technical Administrators
  • 2869 posts
  • Twitter:http://twitter.com/joedolson
  • Facebook:http://facebook.com/joedolson

Posted 19 May 2006 - 07:56 AM

In fact, even at different data centers we have different binaries, different algorithms, different types of data always being tested.


That's true - I read that interview, but I'd forgotten that bit. It doesn't seem exactly the same - this is saying, if I understand correctly, that different data centers may be using different algorithms. It's not necessarily relevant to what has been searched - just the usual random test patterns Google uses for live testing, I think.

Still, he does explicitly say they explore different algorithms simultaneously - now, does he mean different algorithms, or different algorithm conditions?

#13 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 19 May 2006 - 08:02 AM

Michael, I suppose you mean this?

In fact, even at different data centers we have different binaries, different algorithms, different types of data always being tested.

To me that sounds more like "we're always testing different algorithms", not "we're using different algorithms concurrently (on the same datacenter)".

At any rate, it really doesn't matter much - an algorithm that uses an algorithm to choose from different possible paths is still just one algorithm :). I might use different algorithms for driving in the city compared to driving on the highway, but in the end I still only have one "drive car" algorithm (that I constantly work on perfecting ;)).

I recently downloaded the wikipedia, maybe I'll make a search engine for that to play with some factors, heh ;).

Off Topic offtopicOne way that the search engine could be seen as using different algorithms could be the way "organic" search results are displayed next to Adwords results. It is sometimes interesting to see how Adwords can find more relevant "links" than the normal search engine.


John

#14 projectphp

projectphp

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3934 posts
  • Twitter:motherwell
  • Facebook:http://www.facebook.com/mmotherwell

Posted 19 May 2006 - 08:40 AM

It is easy to imagine different algorithms quite trivially. In Rugby Union, a try used to worth 4 points, now it is worth five. Change the scoring, change the game.

Imagine an algorithm that is prejudiced towards a certain type of site. Why should that replace an old, effective algorithm in all instances? Why can;t different algorithms be used concurrently?

A lot was made of the comment that "if 40% of searches were better, 40% the same and 20% worse, we would do it", but what about if those 20% clould be identified? Why not isolate them and use a different algorithm?

Doesn;t seem terribly hard to do either, just a simple merge...

#15 joedolson

joedolson

    Eyes Like Hawk Moderator

  • Technical Administrators
  • 2869 posts
  • Twitter:http://twitter.com/joedolson
  • Facebook:http://facebook.com/joedolson

Posted 19 May 2006 - 08:48 AM

Doesn;t seem terribly hard to do either, just a simple merge...


But wouldn't that possibly make that other 80% less relevant? It seems that it would require something more complicated than a simple merge (that is, obviously merging is necessary, but how to structure that merge would be fairly complicated) to judge how to incorporate that 20% into the other searches - how do you judge how a ranking in one algorithm should be incorporated into the results from another algorithm?

#16 A.N.Onym

A.N.Onym

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 4001 posts
  • Twitter:http://twitter.com/yuraf
  • Facebook:http://www.facebook.com/yura.filimonov

Posted 19 May 2006 - 08:49 AM

I think that's what they do when they try to fix the algorithm - try to adjust the algo to let the dumped 20% live again.

#17 projectphp

projectphp

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3934 posts
  • Twitter:motherwell
  • Facebook:http://www.facebook.com/mmotherwell

Posted 19 May 2006 - 09:30 AM

[quote] (that is, obviously merging is necessary, but how to structure that merge would be fairly complicated[/quote]
Nah, simple as pie. I don't know if people know this, but all SEs produce a score for each document, and then sort the scores to get the final SERP. There is nothing to make independant scores any harder to merge

[quote]how do you judge how a ranking in one algorithm should be incorporated into the results from another algorithm?[.quote]
I don't really see why they need "incorporating". If you have an algorithm that produces one set of scores, and another that produces another set, it is pretty trivial to merge the two sets and de-dupe. That is partly how multiple word searches are done.

How and when to chosoe what algorithm has its own problems, but that isn't neccesarily a big challenge to solve once we accept that we want to do something like this.

But anyway, back to my question: why one algorithm? I don't see how you really need a lot of processing, or a lot of this extra stuff much beyond what Altavista had, to get a good result for many searches. I also can't imagine that all the proposed additions like, say, a trustrank, are always needed for all searches.

Wouldn't different algorithms solve that issue? I mean. why try to "fix" a one size fits all pair of pants for everyone in the family, when dad and junior can each have their own pair?

Going right back to the Florida update, the -junkword stuff seemed to show that there were "trigger" words, or different ways specific searchs were handled, and I wonder if there are different algorithms in play at the same time accross different SERPs.

I also wonder if this would negate, in part, a lot of the "the algorithm needs to be fixed" stuff I see. After all, such statements pre-suppose a singular algorithm.

#18 joedolson

joedolson

    Eyes Like Hawk Moderator

  • Technical Administrators
  • 2869 posts
  • Twitter:http://twitter.com/joedolson
  • Facebook:http://facebook.com/joedolson

Posted 19 May 2006 - 10:24 AM

I don't know if people know this, but all SEs produce a score for each document, and then sort the scores to get the final SERP. There is nothing to make independant scores any harder to merge


I did know that, actually - but if the two algorithms produce an identical score for two different pages, what would determine which ranked higher?

I think this is the problem with merging the lists - one would need to prioritize which algorithm would take precedence in the circumstance of an identical score.

Now, granted, with a complex algorithm utilizing over 100 factors, etc., etc., it does seem quite unlikely that you'd have a whole lot of identical scores.

It's funny how I started out in this conversation defending the idea of multiple algorithms, and now I seem to be on the other side...yet I still think it's entirely practical that these search engines could be implementing something along these lines.

I also wonder if this would negate, in part, a lot of the "the algorithm needs to be fixed" stuff I see. After all, such statements pre-suppose a singular algorithm.


Could be...it's an interesting idea, at any rate!

#19 Black_Knight

Black_Knight

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 9293 posts
  • Twitter:http://twitter.com/#!/Ammon_Johns
  • Facebook:http://www.facebook.com/ammon.johns

Posted 21 May 2006 - 01:24 PM

We often talk about an algorithm as if there were just one. However, we already know that, for example, PageRank is a separate algorithm that takes a lot of resources to compute and is used alongside many other algorithms with some over-arching 'master algorithm' that assigns relative values to each of the algorithms in play.

Danny Sullivan wrote a few articles about Google's 'invisible tabs' that made certain searches behave as though different tab features had been selected. There certainly are vast differences in needs expressed by a search for "pensions" (a broad term that seeks general information about an entire field) and "indexed-linked pensions with cashback facility", which indicates that the searcher already has the basic research done, and is now in the latter stages of the shopping process.

I would certainly imagine that the number of keywords used in a search could be used to alter the algorithm used, so that the short 'generic' searches provided a wider array of broad information, while the longer more refined and specific searches would favour commercial sites a bit more.

Not only do I imagine that this is easily possible, I think it would produce better results for the user in very many cases, so I'm certain that all the brilliant scientists would have at least tested it out.



Reply to this topic



  


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users