Apache Nutch System

From GM-RKB
Jump to navigation Jump to search

An Apache Nutch System is a Lucene-based web-search software that is an Apache project.



References

2012


  • http://en.wikipedia.org/wiki/Nutch#Features
    • QUOTE: Nutch is coded entirely in the Java programming language, but data is written in language-independent formats. It has a highly modular architecture, allowing developers to create plug-ins for media-type parsing, data retrieval, querying and clustering.

      The fetcher ("robot" or “web crawler") has been written from scratch specifically for this project.

2011