Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

Wikipedia Founder Plans Open Source Search Engine


  • Please log in to reply
13 replies to this topic

#1 swainzy

swainzy

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3322 posts

Posted 28 July 2007 - 12:07 PM

San Jose Mercury News is running this article about Wikipedia's attempt at an open source search engine.


Wales' goal is to make Internet search more accurate by revealing the technology behind it. He said he would release Grub's computer code under an open-source license that allows others to make improvements.


How do you think this will affect searching?

Edited by swainzy, 28 July 2007 - 12:09 PM.


#2 EGOL

EGOL

    Professor

  • Hall Of Fame
  • 5500 posts

Posted 28 July 2007 - 01:03 PM

Have you watched the content of a wikipedia topic? Lots of goals, lots of agendas, some extremely competent get edited by idiots.....

..... let that compete with a company that is highly motivated by performance, assessment and profits.

Edited by EGOL, 28 July 2007 - 01:04 PM.


#3 Ruud

Ruud

    Hall of Fame

  • Hall Of Fame
  • 4887 posts

Posted 28 July 2007 - 01:48 PM

Wales' goal is to make Internet search more accurate by revealing the technology behind it. He said he would release Grub's computer code under an open-source license that allows others to make improvements.

"It's not a good thing that we are getting search results from a handful of very large players and we have no idea how they are generated," Wales said [...]


The software can be improved, yes. But unless the algorithms are hard coded into the programming code itself, there is no way to know what is weighed how and when. And once you publish the algorithms they can be played, opening the open source search engine up to a flood of spam larger than what we have ever seen before.

So far I haven't heard or read anything about this project that makes me go "oh wow, these guys are on the the next big thing!". In fact, it all sounds a lot "done" to me.

The problem is that it is doing the same thing somewhere else simply "because".

"It's like getting all your news from one source."


No, it like getting a telephone number you want from a telephone directory. Unless you come up with something better, there isn't a real need for "just another" telephone directory...

Although most of the current search technology is "dumb" technology, we're doing pretty good with it. But for a few quite vague and, worse, ambivalent queries, I can't really remember a recent frustrating moment of throwing my hands in the air exclaiming "I can't find it!!".

The main problem, to me, seems to be discovery. That golden nugget of solid documentation on that underlinked page. Maybe if another search engine would only return results from sites under a certain popularity level, that would be something different?

The added community aspect Jimmy Wales seems to rely on a lot doesn't impress me either. Google leverages by far the largest community already: each and every webmaster, each and every web document, each and every link. How is a million links voting for resource X less, or less transparent, than a million community members voting a resource up or down?

Edited by Ruud, 30 July 2007 - 01:35 PM.


#4 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 2192 posts

Posted 28 July 2007 - 01:58 PM

Have you watched the content of a wikipedia topic? Lots of goals, lots of agendas, some extremely competent get edited by idiots

Ouch. That sounds like a thumbs down to me. There was a thread here about Wikipedia. Not sure some would agree with that. There is a Firefox plugin to exclude Wikipedia results from the SERPs and, as I learnt recently, there is a group of Wikipedia haters out there.

#5 3rdeye5

3rdeye5

    Gravity Master Member

  • Members
  • 154 posts

Posted 29 July 2007 - 01:55 AM

Yet another dilettante issuing a press release that he's gonna beat Google.

I don't think improving Grub is going to make a lot of difference for search results.

The community editing will, but will there be so much difference with DMOZ? I for one very seldom use DMOZ for search. To review every page from the web in the index, by several editors, is an awful lot of work. So it's going to be imperfect, leaving open lots of room for manipulation - which is exactly what he wants to combat.


Ewald

#6 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 29 July 2007 - 06:56 AM

Personally I want Mr. Wales idea to succeed and I have no hesitation to laud his efforts in the open source movement and in addition his efforts in respect with his attempt to tackle some of the fundamental problems of search technology.

His efforts are concentrated in building a 'different' search engine. Not necessarily a better one at present, but given enough time and interest it is bound to evolve to a better one as well. The Symbiotic Intelligence project expounds upon some of these ideas.

The ideas are borrowed from the mathematics of self-organizing systems. The web, and wikipedia are two such systems. Human interaction is necessary since AI is still not there! (Even Google is reputed to employ upwards of 10000 human editors that are used to refine results and algorithms). The founders of Wikipedia are well versed to the problems of 'human interaction', the whole shebang, the good the bad and the ugly!

The Open Source community has given us some gems. The Apache server, php, countless CMS such as Wordpress, Drupal etc. They all succeeded because of human interaction. Almost all human interaction follows patterns of self-organization. It has to, since if it doesn't this world will just disappear and we've almost been there a couple of times. Every time we've been there, human interaction was lop-sided i.e it was in the hands of only a few individuals. (By the way Google also uses human interaction indirectly i.e all the webmasters voting for each other with their links! Until SEO came along!)


I for one has registered!

Yannis

PS My opinion this project is a 15 year Project!

#7 sharkeo

sharkeo

    Unlurked Energy

  • Members
  • 3 posts

Posted 30 July 2007 - 09:50 AM

I agree with Ruud,

Providing an open source search engine is likely to be abused and would be somewhat difficult to manage. I mean no disrespect to the open source developers who have brought us great gems (yannis :-)), but feel there are always a handful of people who will ruin it for everyone else.

An example that springs to mind, is the SEO Competition a while back. This competition resulted in many wiki pages getting spammed with links, so as a result Wiki placed the rel="nofollow" with no exceptions.

I would also consider the user and their search behaviour, thinking about it, I have almost come immune to performing a number of searches (At least 2) before I find the result I'm looking for. This is often the case unless I am searching for a brand. I do this to ensure the results returned are completely relevant.

If I was to search only once and rely on the results as 100% gospel, I would be somewhat sceptical, especially if the SE is open source.

This is only my opinion, it would be nice to see some competition against Google, but realistically I think in the near future it would be difficult.

#8 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 30 July 2007 - 01:18 PM

Providing an open source search engine is likely to be abused and would be somewhat difficult to manage.


The beauty of a self-organizing system is that it will always find a stable state. When an unbalance is introduced for example by spamming within days the hundreds of thousands of users will contribute to re-balance it. It is also a problem with search engines such as Google. I keep a collection of websites that have managed to spam the search engines and use them to study weaknesses in Google's algos. (They are very quick to correct spam these days only because they have introduced a way for people to report incidents! The algos still don't catch all the spamming).

Anyway I do not believe competition to Google can come overnight, but I believe that at least one can try and experiment with different ways for search. Google is not the best engine for many languages - for example chinese or arabic or greek and other engines tend to dominate.

I have also a concern that Google - being a commercial venture - can bias the results in order to serve its own ads selling policies. How sure are you that Google is giving you unbiased results? If they would, there wouldn't be any SEO's around!

You are right though that we will not see serious competition to Google any time soon!


Yannis

#9 Econman

Econman

    Unlurked Energy

  • Members
  • 3 posts

Posted 30 July 2007 - 03:51 PM

Those who are skeptical of the potential for an open source search engine competing with Google are focusing on the algorithms that determine which sites rank well. That is the area where an open source solution would be least effective.

I can visualize at least two ways an open source solution might be effective in search -- not necessarily good enough to displace Google, but more effective than MSN/Yahoo/Ask in creating a more balanced environment in which Google no longer totally dominates the landscape:

1. Developing an open source of data (all the types of data that Google uses to run their search engine). This data would be available to, and used by, numerous other sites -- both for profit and non-profit, each of those sites would focus on a specific vertical niche, or they would develop their own special "flavor" of search results appealing to a certain audience. Part of their product differentiation would be the way they use the underlying data to rank and filter sites to be displayed in their results.

2. Creating one or more open "communities" that would vote thumbs up or thumbs down on the quality or relevance of sites (e.g. identifying what they perceive to be as "spam"). If this were done in the right manner, it would provide useful feedback, yet it wouldn't be subject to abuse, because the resulting data would be just one more input available for use by various search engines (see item 1 above). For this to work well, you would either need multiple communities, and these different communities would develop different reputations for their abilities to correctly identify spam, or you would need to be able to identify individual participants in the community, and different participants would develop different reputations for their abilities to correctly identify spam (e.g. no one would pay any attention to the opinions of participants from India that have been hired by spammers to vote up their spammy sites).

#10 Ruud

Ruud

    Hall of Fame

  • Hall Of Fame
  • 4887 posts

Posted 30 July 2007 - 06:25 PM

#1 is a very interesting idea and certainly a good business model. If I remember well, A9/Alexa went this way earlier last year?

#2 ... I remain skeptical. If today Google would give full disclosure of their complete ranking mechanisms and open up that ranking to re-ranking through community input, I'm quite sure within hours offshore freelancers in countries like India and Romania would be flooded with requests to switch from comment spam and ad clicks to voting on Google.

Perhaps a truer vision of community driven search is something like Naver?

#11 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 30 July 2007 - 10:47 PM

Perhaps a truer vision of community driven search is something like Naver?

Absolutely Ruud and that was one of the points that I argued that there is more to the internet than just the english speaking world. The article about Naver describes perfectly the 'good' side of people.

Tapping a South Korean inclination to help one another on the Web has made Naver.com the undisputed leader of Internet search in the country. It handles more than 77 percent of all Web searches originating in South Korea, thanks largely to content generated, free of charge, by people like Park and Cho.


Econman your comments that one should not just think algorithms are valid and better search engines will come about by a combination of both. Spam and manipulation was is and will be a problem for all search engines but it can be contained. For example meta tags are dead due to people having used them the wrong way in the past, but a new generation of 'tagging' has arrived. How many people do really abuse 'tagging' on a blog?

Yannis

#12 swainzy

swainzy

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3322 posts

Posted 31 July 2007 - 12:19 AM

Here is more on Jimmy Wales motivation at SFgate.

"What a difference a little principle makes. There are plenty of reasons to criticize Wikipedia. The encyclopedia entries, which are written by the public, often reflect "truthiness" rather than "truth;" and there arise serious questions of measurement and historical judgment when you throw open the doors for everyone to be an expert (it's appalling that Britney Spears' entry, for instance, is three times the size of the one for St. Augustine).

But I gained a lot of respect for Wikipedia after founder Jimmy Wales told me that he had no idea why the Chinese government chooses to block or unblock his site -- after a year and a half, the site was unblocked in China last week, but may not stay that way -- but he has no intentions of doing anything that would undermine Wikipedia's goal, which is "to have a free encyclopedia for every person on the planet, in her own language."


"Of course, Wikipedia is a nonprofit foundation, not a public company such as Yahoo or Google. Wales doesn't have to answer to shareholders"

"Wales said it's helpful that the purpose of Wikipedia "isn't to maximize profit, it's to maximize knowledge," but he thinks his strategy will one day make good economic sense as well. "I think some of the other companies are being shortsighted," he said. "They're damaging their brands by going along with the Chinese government in this way, and when this short period in Chinese history is over, people will remember that they censored."

And there is more to this here.

#13 conficio

conficio

    New To Community

  • Members
  • 2 posts

Posted 02 August 2007 - 10:18 AM

I have no way to read the article (because it requires log-in), but open source search engines to exist. Some of the most popular ones are:

Apache Lucene - indexing technology and efficient storage engine for indexes
Apache Solr- A search engine with a indexer and weighing algorithm, that needs a crawler for input (useful for specific data search on Intranet or one site).
Nutch - a full fletched web crawler and result weighing algorithm, currently working on a new architecture for scalability

One issue of an open source search engine is not the quality of spidering, indexing and results, but the amount of horse power required to run such a beast for the public (think data centers and bandwidth). And also the monetization platform, like advertisement auctions, etc.

#14 3rdeye5

3rdeye5

    Gravity Master Member

  • Members
  • 154 posts

Posted 02 August 2007 - 11:01 AM

On checking out the Twiceler useragent, I ran into this article by Anna Patterson about how to make your own search engine. I thought it was interesting, and perhaps it's too for those who think about joining that Wiki search engine project.


Ewald



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users