流水不腐: [Softwares]Nutch

星期日, 八月 23, 2009

[Softwares]Nutch

刚在自己机器上安装了Nutch，一个开源的搜索引擎。读了 Doug Cutting的blog：

About five years ago I was working with Mike Cafarella on Apache Nutch, an open-source web-search engine. Initially we were able to crawl and index on four machines in parallel, but with a lot of manual steps. Inspired by two Google papers, we implemented a distributed filesystem and MapReduce implementation that automated most of these steps. Operation became much simpler, and we were then able to easily run Nutch on twenty machines, with near-linear scaling.

因为inspired by two papers, we immplemented...
一件巨大的事情，开始就是这么简单。
笔记