Do Search Engines Use One Algorithm For All Results?
#2
Posted 19 May 2006 - 12:50 AM
I bet you knew this before, though
#3
Posted 19 May 2006 - 01:02 AM
A search for do blind people dream doesn't need the same ammount of processing to get a "good" result as would, say, mortgage. So using one algorithm wouldn't neccesarily be the best way forward in my view.
I also wonder if the 10 results all use the same algorithm, e.g. if there is more than one ranking algorithm in play, and all 10 results are from the same algo, or there are more than a few.
I really have no proof or real evidence, but it seems to me the best way to produce a varied SERP, which IMHO is partly the goal, is to have the ten results utilise different algorithms. You could then have algorithms that favoured different things, content, off page, specific areas etc etc.
I just wonder what other people think
#4
Posted 19 May 2006 - 01:08 AM
I don't think it'd make sense to process some queries with less vigor than others.
If we can provide top quality results for 'mortgage' (at least as much as the search engines allow), why not provide the same quality when searching for less popular queries?
That being said, I've read somewhere that the Big Daddy update changed the SE algorithm structure, allowing connecting different modules, used specifically for different purposes.
#5
Posted 19 May 2006 - 02:27 AM
Another thought occurs to me now - are the "vertical creep" results which Google brings in from their images index, etc., perhaps sourced using an alternate algorithm? I've usually thought about them as coming from a different index - but it seems quite reasonable to suppose they're applying a different algorithm as well.
A.N.Onym - I hadn't heard that suggestion about BigDaddy - any chance you can point to an article somewhere?
#7
Posted 19 May 2006 - 04:19 AM
With over 200 signals to play with, they don't really need to use different "algorithms" - instead they just play with the weighting factors.
I really have doubts that Google would use different algorithms for different searches. It is not like Google to do so. However, I'm sure that they have signals which react to the general "spammy-ness" of the results and the popularity of keywords.
Additionally, I doubt that Google will use a simple linear weighted sum to calculate the URL "weight" in the results, instead I think they use signals and combinations of signals as factors (or inverse factors) for weighting other signals.
An example of what I mean: Let C = "competition" (higher=more), P = sum of onpage-factors
Instead of doing:
weight = .... + wC * C + wP * P
they could be doing:
weight = .... + wP * P / (wC * C)
Which would decrease the weight of the on-page factors for terms which are very competitive. Add to that any number of other mathematical operations, and you'll soon have so many possible variations that they'd need a Google-Datacenter just to store them
Google is a geek-company - they would not go towards a manual selection (this algorithm for that, the other for other searches) - they would always try to have one (probably very complicated) formula for their calculations. Figuring out which weights and which combinations to use is something that they probably do based on a known corpus of sites / keywords and once they're satisfied with internal results they put them live on a datacenter - to judge the user-reports that they get.
Even if you had access to the formula, all the weighting factors and knew how they determined the signals (eg live "pagerank" based on pre-discounted links minus trust of those links), it would be one heck of a job to optimize a site to stay on top. I'd venture to guess that it would be easier NOT knowing the exact formula. Imagine 200 signals that you have to tune....
John
#8
Posted 19 May 2006 - 04:46 AM
I read "somewhere" that Google uses over 200 "signals" for ranking. Additionally, they seem to have some sort of manual penalty system - probably site, url and keyword based. (The penalty system seems to have been partially automated with BigDaddy, something I find very disturbing)
Matt Cutts has definitely stated on a number of occasions that Google uses over 100 (I think I've seen over 200 somewhere as well), but most recently states:
Is PageRank such a big deal? Can you value a link on a page by its PageRank? Matt says:
"There are over 100 factors in ranking. And PageRank is just one of them. It's an important factor, but it's by no means the be-all and end-all.
However, I'd also point out that this discussion isn't JUST about Google's ranking strategy - I think we're trying to look more generally at search engine ranking methodology.
I really have doubts that Google would use different algorithms for different searches. It is not like Google to do so.
I'm curious why you say this - I don't personally see any way in which Google has a particular tendency not to use alternate algorithms. This doesn't necessarily mean that they use search algorithm "A" for one term and algorithm "B" for another - it could mean that they apply different theories to retrieve documents when you're searching Blogs versus Images.
There's also some question whether an algorithm has itself changed if the factors weighted within it have changed. I think (and I'm not terribly mathematically inclined, so correct me, please) that an algorithm is fairly defined entirely in absense of any conditions:
Algorithm:
A set of ordered steps for solving a problem, such as a mathematical formula or the instructions in a program. The terms algorithm and logic are synonymous. Both refer to a sequence of steps to solve a problem. However, an algorithm implies an expression that solves a complex problem rather than the overall input-process-output logic of typical business programs.
That is, the algorithm remains unchanged regardless of the value of the variables used to weight the conditions. However, what if the weighting of variables is itself achieved using a sub-algorithm? Such that the weighting of a condition is in fact adjusted algorithmically rather than absolutely - perhaps, in this case, it could be argued that different searches actually are using different algorithms, at least in part.
Metasearch engines operate by leveraging the results of other engines (and therefore, their algorithms) and applying their own algorithmic operations to compile the results - these engines, therefore, are explicitly making use of multiple algorithms in each search. If such an engine selects a different set of engines to query depending on your search term, it is therefore using a different algorithm - but also a different index.
Complex subject...
#9
Posted 19 May 2006 - 05:58 AM
I really have doubts that Google would use different algorithms for different searches. It is not like Google to do so.
I'm curious why you say this
It's just a feeling
It's hard to put to words (=gut feeling, I have no proof), but so many of the things I've seen from Google which handle large amounts of data all seem to be based on a single algorithm. When you start thinking in terms of the total number of queries, pages, sites it just doesn't make sense to manually switch algorithms depending on some "human factors". If it is done automatically, then IMHO that is still the same algorithm.
A meta-search engine is in my opinion something different: they just put factors on different result-sets, they can't compare and rank the returned results (because they don't have more information on them). Even then it could be argued that they use a single algorithm: "f(x) = fGoogle(x) * wGoogle + fYahoo(x) * wYahoo + .... " - no matter that Google's algorithm might change, from the outside of the black box it is still a single result set based on a query (+ other factors such as time, geolocation, etc etc).
One thing I've always wanted to do was re-create a search engine like Google. Imagine if you can take those 200 (or 100) signals and simulate them by using only 10 - if you can get the top 10-100 results exactly the same with a simulation. It would be so much fun to play with a simulation like that, but in reality it would take a lot of work to even the basic data to recreate it (even if for the simulation you limit yourself to an extremly small subset of sites). <sigh> too many ideas, too little time
John
#10
Posted 19 May 2006 - 06:12 AM
When you start thinking in terms of the total number of queries, pages, sites it just doesn't make sense to manually switch algorithms depending on some "human factors". If it is done automatically, then IMHO that is still the same algorithm.
Yep. See your point...
A meta-search engine is in my opinion something different: they just put factors on different result-sets, they can't compare and rank the returned results (because they don't have more information on them)
Now what if a meta-search engine started maintaining it's own index, in addition to compiling external results? I'm getting off-topic and a bit fantastical now, I think - but do you know of anything out there which might be doing this?
I don't see a lot of value to such an activity - one of the advantages to meta-search is the significantly smaller infrastructure since you don't need to maintain your own indices, engineer a crawler, etc. - not sure what the advantages could be in combining meta-search with "normal" search, except for uniqueness...(maybe).
<sigh> too many ideas, too little time
<sigh>.
#11
Posted 19 May 2006 - 07:43 AM
#12
Posted 19 May 2006 - 07:56 AM
In fact, even at different data centers we have different binaries, different algorithms, different types of data always being tested.
That's true - I read that interview, but I'd forgotten that bit. It doesn't seem exactly the same - this is saying, if I understand correctly, that different data centers may be using different algorithms. It's not necessarily relevant to what has been searched - just the usual random test patterns Google uses for live testing, I think.
Still, he does explicitly say they explore different algorithms simultaneously - now, does he mean different algorithms, or different algorithm conditions?
#13
Posted 19 May 2006 - 08:02 AM
To me that sounds more like "we're always testing different algorithms", not "we're using different algorithms concurrently (on the same datacenter)".In fact, even at different data centers we have different binaries, different algorithms, different types of data always being tested.
At any rate, it really doesn't matter much - an algorithm that uses an algorithm to choose from different possible paths is still just one algorithm
I recently downloaded the wikipedia, maybe I'll make a search engine for that to play with some factors, heh
John
#14
Posted 19 May 2006 - 08:40 AM
Imagine an algorithm that is prejudiced towards a certain type of site. Why should that replace an old, effective algorithm in all instances? Why can;t different algorithms be used concurrently?
A lot was made of the comment that "if 40% of searches were better, 40% the same and 20% worse, we would do it", but what about if those 20% clould be identified? Why not isolate them and use a different algorithm?
Doesn;t seem terribly hard to do either, just a simple merge...
#15
Posted 19 May 2006 - 08:48 AM
Doesn;t seem terribly hard to do either, just a simple merge...
But wouldn't that possibly make that other 80% less relevant? It seems that it would require something more complicated than a simple merge (that is, obviously merging is necessary, but how to structure that merge would be fairly complicated) to judge how to incorporate that 20% into the other searches - how do you judge how a ranking in one algorithm should be incorporated into the results from another algorithm?
#17
Posted 19 May 2006 - 09:30 AM
Nah, simple as pie. I don't know if people know this, but all SEs produce a score for each document, and then sort the scores to get the final SERP. There is nothing to make independant scores any harder to merge
[quote]how do you judge how a ranking in one algorithm should be incorporated into the results from another algorithm?[.quote]
I don't really see why they need "incorporating". If you have an algorithm that produces one set of scores, and another that produces another set, it is pretty trivial to merge the two sets and de-dupe. That is partly how multiple word searches are done.
How and when to chosoe what algorithm has its own problems, but that isn't neccesarily a big challenge to solve once we accept that we want to do something like this.
But anyway, back to my question: why one algorithm? I don't see how you really need a lot of processing, or a lot of this extra stuff much beyond what Altavista had, to get a good result for many searches. I also can't imagine that all the proposed additions like, say, a trustrank, are always needed for all searches.
Wouldn't different algorithms solve that issue? I mean. why try to "fix" a one size fits all pair of pants for everyone in the family, when dad and junior can each have their own pair?
Going right back to the Florida update, the -junkword stuff seemed to show that there were "trigger" words, or different ways specific searchs were handled, and I wonder if there are different algorithms in play at the same time accross different SERPs.
I also wonder if this would negate, in part, a lot of the "the algorithm needs to be fixed" stuff I see. After all, such statements pre-suppose a singular algorithm.
#18
Posted 19 May 2006 - 10:24 AM
I don't know if people know this, but all SEs produce a score for each document, and then sort the scores to get the final SERP. There is nothing to make independant scores any harder to merge
I did know that, actually - but if the two algorithms produce an identical score for two different pages, what would determine which ranked higher?
I think this is the problem with merging the lists - one would need to prioritize which algorithm would take precedence in the circumstance of an identical score.
Now, granted, with a complex algorithm utilizing over 100 factors, etc., etc., it does seem quite unlikely that you'd have a whole lot of identical scores.
It's funny how I started out in this conversation defending the idea of multiple algorithms, and now I seem to be on the other side...yet I still think it's entirely practical that these search engines could be implementing something along these lines.
I also wonder if this would negate, in part, a lot of the "the algorithm needs to be fixed" stuff I see. After all, such statements pre-suppose a singular algorithm.
Could be...it's an interesting idea, at any rate!
#19
Posted 21 May 2006 - 01:24 PM
Danny Sullivan wrote a few articles about Google's 'invisible tabs' that made certain searches behave as though different tab features had been selected. There certainly are vast differences in needs expressed by a search for "pensions" (a broad term that seeks general information about an entire field) and "indexed-linked pensions with cashback facility", which indicates that the searcher already has the basic research done, and is now in the latter stages of the shopping process.
I would certainly imagine that the number of keywords used in a search could be used to alter the algorithm used, so that the short 'generic' searches provided a wider array of broad information, while the longer more refined and specific searches would favour commercial sites a bit more.
Not only do I imagine that this is easily possible, I think it would produce better results for the user in very many cases, so I'm certain that all the brilliant scientists would have at least tested it out.
Reply to this topic

0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users






