Apache projects and related notes

From Helpful
Jump to navigation Jump to search


A robot writing framework -- letting you write things like web spiders, and anything else where automated and fault-tolerant fetching is nice.

Can use Tika[1], a separate project that extracts metadata and sometimes structured data out of a range of types of documents. Tika is also used by Nutch and other things

See also: