Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

Search getting boring? Not for Nutch longer.


  • Please log in to reply
10 replies to this topic

#1 Black_Knight

Black_Knight

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 9339 posts

Posted 11 September 2003 - 01:29 PM

http://www.searcheng...cle.php/3071971

An open-source search engine, open to all to innovate and modify - this will surely place the cat among the proverbials. Its not an idle venture either.

Mitch Kapor, who helped found Lotus Development and the Electronic Frontier Foundation and is founder and president of the Open Source Applications Foundation, certainly agrees. He's thrown his weight behind the project by joining Nutch's nonprofit board, as has Tim O'Reilly, the CEO of O'Reilly & Associates. Brewster Kahle, the visionary behind the Internet Archive, has also lended his support. Nutch is moving its servers to Kahle's high-bandwidth location this weekend, a crucial step toward readying the engine for its public debut.



#2 Guest_Lots0_*

Guest_Lots0_*
  • Guests

Posted 11 September 2003 - 01:47 PM

A serious new open source search engine that just may be able to compete directly with the big guys...all I can say is WOW! I canít wait to see it.

#3 Adrian

Adrian

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 5779 posts

Posted 11 September 2003 - 02:40 PM

Hmm, I can't help feel that an Open Source SE would just allow those so inclined and skilled, to go into the code, learn the excat ranking criteria and abuse it more accurately and quickly than is currently possible.

I wish it the best of luck, I can just see it being far to easy to abuse though.

But then, I guess in one sense, if everyone knew the exact details of the ranking criteria, and all made use of the knowledge, everyones going to be on a level playing field in the end.....

#4 Guest_Lots0_*

Guest_Lots0_*
  • Guests

Posted 11 September 2003 - 04:50 PM

if everyone knew the exact details of the ranking criteria, and all made use of the knowledge, everyones going to be on a level playing field in the end

I think this is the main point, there will also be SEO's/programmers that help to improve the algo as well, that's what is great about open source.

#5 wanderer

wanderer

    Mach 1 Member

  • Members
  • 306 posts

Posted 11 September 2003 - 10:54 PM

Competition? Maybe. But doesn't open source mean its available for anyone to use? Isn't that to the benefit of the other search engines?

As was said in the article:

There are a lot of smart people out there that Google can't hire.


Google may not be able to hire them, but it doesn't mean they can't benefit from their work. If those smart people come up with better ways to provide relevant results, whatever they come up with is free for the taking. Google already has the market-share. If they can improve their results by adding open source tweaks to their algorithm, their share won't decrease significantly. It would probably even be in their interest to quietly put some money into it...

Please correct me if I have the whole idea of the open source thing wrong.

#6 Black_Knight

Black_Knight

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 9339 posts

Posted 12 September 2003 - 12:59 AM

You're certainly in the right area there, wanderer, except that while the open source code is free for anyone to take and modify, they must give the modifications and improvements back into the project (making those open too). If your improvement or modification was to 'bolt-on' the entire Google algorithms, then you'd have to give all that to the project!

Linux and Apache are the two greatest icons of what can be acheived by Open Source programming ventures - and indeed of the robustness, stability and thorough testing of the end result. However, there are many more open source projects that simply didn't work out as spectacularly, so it is no magic wand.

However, the biggest thing for the internet at large is this: An open source search engine could be used by any and all commercial sites and intranets too for free. In-house search could really benefit the most from this.

#7 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 12 September 2003 - 05:58 AM

This is excellent news, and may end up leading to better implementation of onsite search at many sites, too.

#8 Grumpus

Grumpus

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 6294 posts

Posted 12 September 2003 - 06:14 AM

I didn't read the entire article, but did check out the Nutch homepage. All I can say is "ugh". My robots.txt file is going to be almost as big as my site with all the different nutch variations trying to crawl me for backfill for their own top ranked sites, for the experimenters. Will be hard to tell who is a player and who is playing.

G.

#9 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 12 September 2003 - 06:17 AM

That's a good point, Grumpus.

A really good point.

maybe it's time for an update to the robots exclusion protocols? The nutch update?

We'll have to keep an eye on long files.

#10 csmithm35

csmithm35

    Ready To Fly Member

  • Members
  • 26 posts

Posted 20 September 2003 - 11:32 AM

I wouldn't worry nutch about Nutch. There have been open source spiders (some very nice ones) on SourceForge for quite awhile. Nutch would make a wonderful site search I'm sure. However, other than that once the pretty veil of the idealism of the concept is whipped away and the colder darker aspects of our nature beneath are allowed to reveal themselves Nutch will be completely ruined.

What colder darker aspects am I referring to? Look at Google. They have come under criticism for not revealing the more intimate aspects of the PR algorithm -- but can you imagine what would happen if they did? People would bounce for joy and pounce like panthers and begin capitalizing on every minute little variable PR relies on to gauge decent page relevance. In their eagerness to advance their own desires, they'd completely drain any worth from one of the best engines in the world. Much like mining ecological resources -- knowing while you're mining the riches will last only so long. Once you have taken all you can you'll simply have to move on to the next leaving behind an empty lifeless husk.

Now take Nutch for instance -- a virgin laid bare from the onset on the sacrificial altar. How long do you think it will take unethical Vampires to drain every ounce of blood from it? As soon as it looks like taking the time to manipulate their listings would prove valuable they would pounce and leave Nutch cold and lifeless. A beautiful concept, very very commendable, but it simply is not taking the human inclination to capitalize and manipulate into consideration.

lol If that post make me sound Jaded I'm not. We need to open our eyes and realize what repercussions the actions of today will have on tomorrow. Until I feel confident most look at Life with an eye on tomorrow I'll remain realistic about the idealistic concepts of today. I treasure and value idealism, but the only way it can ever truly bear fruit is if it is nurtured by the culture (or Market) it is born in -- and throwing Notch into a ruthless search engine Industry is like throwing a new born at a pack of rabid wolves.

-Cory

#11 Black_Knight

Black_Knight

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 9339 posts

Posted 20 September 2003 - 05:01 PM

Playing Devil's Advocate for a moment here, lets assume that many SEOs do get involved... How about if they decide to make the exploits known, and so keep working on blocks. Not to make it impossible to cheat - that can't happen - but to make it so that beating the algorihm requires more effort than is feasible for most - that it would take a year of hard work to exploit the algo, and that in doing so, you'd have had to tweak the site into deserving the position in the first place.

Naturally, thats assuming an unlikely level of altruism. If someone works on the algorithm it is going to be because they want something for it, even if only kudos. Most people volunteering to improve the algorithm will be doing as you say, looking to find the loopholes and not close them.



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users