Nutch
From Wikipedia, the free encyclopedia
Lucene Nutch | |
![]() |
|
![]() Nutch Web Interface Search |
|
Developer: | Apache Software Foundation |
---|---|
Latest release: | 0.8.1 / September 24, 2006 |
OS: | Cross-platform |
Use: | Search Engine |
License: | Apache 2.0 Licence |
Website: | http://lucene.apache.org/nutch |
Nutch is an effort to build an open source search engine based on Lucene Java for the search and index component. The fetcher ("robot" or "web crawler") has been written from scratch solely for this project. Nutch has a highly modular architecture allowing developers to create plugins for the following activities: media-type parsing, data retrieval, querying and clustering. As of June 2005, Nutch has graduated from the Apache Incubator, and is now a subproject of Lucene. It is coded completely in the Java programming language, but data is written in language-independent formats. In June 2003, there was a successful 100 million page demo system. To meet the multimachine processing needs of the crawl and index tasks, the Nutch project has also implemented a MapReduce facility and a distributed file system. These two facilities have been spun out into their own subproject called Hadoop.
[edit] Related projects
[edit] Search engines built with Nutch
- mozDex
- Krugle
- BusyTonight
- Wikiasari
- MetaMojo.com
- Greener, a search engine for green resources
[edit] External links
- Official page of the Nutch project
- Building Nutch: Open Source Search (2004) - ACM Queue vol. 2, no. 2
- An article about Nutch (2003) - Search Engine Watch
- Another article about Nutch (2003) - Tech News World
- unofficial Documentation
- Official page of the Hadoop project
|
|
---|---|
Top level Projects | Apache HTTP Server • ActiveMQ • Ant • APR • Beehive • Cayenne • Cocoon • Directory • Excalibur • Forrest • Geronimo • Gump • iBATIS • Jackrabbit • James • Lenya • Maven • Mina • MyFaces • OFBiz • mod_perl • SpamAssassin • Struts • Tcl • Tomcat • Axis • Axis2 • WSIF • XMLBeans • Tapestry • HiveMind • WebWork 2 • Harmony • Velocity • Santuario • Shale |
Apache Jakarta Project | BCEL • BSF • Cactus • Commons • ECS • HttpComponents • JCS • JMeter • ORO • POI • Regexp • Slide • Taglibs • Turbine |
Apache DB | Derby • Torque • DdlUtils • OJB • JDO |
Apache Portals | Jetspeed 1 • Jetspeed 2 • Graffito • Pluto • WSRP4J |
Apache Lucene | Lucene Java • Nutch • Hadoop • Lucene4c • Lucy |
Apache XML | AxKit • Xalan • Xerces |
XML Graphics | Batik • FOP |
Apache Logging | Log4j • Log4Cxx • Log4Perl • Log4PLSQL |
Apache Incubator | XAP • River • OpenEJB • OpenJPA • ServiceMix • Wicket • Graffito • Tuscany • Log4Net • Roller • Felix • Abdera • CeltiXfire • FtpServer • Heraldry • Ivy • JuiCE • Kabuki • Lokahi • Lucene.Net • mod_ftp • NMaven • Ode • stdcxx • Woden • WSRP4J • Yoko • Log4PHP • WADI • Qpid • stdcxx • TripleSoup • UIMA • wadi |
License: Apache License • Website: apache.org |