This plugin extends nutch HtmlParseFilter interface to extract html pages and its content during nutch's parsing phase. By default the plugin extracts html pages and its text content and save it in ...
This project simplifies using GitHub's fetch polyfill in environments where HTML imports are used. Using HTML imports ensures that the polyfill code is only run once.