Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The hard part is already solved, you don't even have to crawl the web to build the index. There is already a periodically refreshed index of the web that you can download: commoncrawl.org

Now someone just needs to configure, Apache Lucene as a proper docker image that can consume this index.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: