About five years ago I was working with Mike Cafarella on Apache Nutch, an open-source web-search engine. Initially we were able to crawl and index on four machines in parallel, but with a lot of manual steps. Inspired by two Google papers, we implemented a distributed filesystem and MapReduce implementation that automated most of these steps. Operation became much simpler, and we were then able to easily run Nutch on twenty machines, with near-linear scaling.
因为inspired by two papers, we immplemented...
一件巨大的事情,开始就是这么简单。
笔记
没有评论:
发表评论