Difference between graphlab and mahout:
Mahout | Graphlab |
---|---|
Mahout is a framework for machine learning and part of the Apache Foundation |
Graphlab project takes a quite different approach to parallel collaborative filtering (more broadly, machine learning), and is primarily used by academic institutions. |
Mahout has inherent Fault-tolerance | Graphlab does not have inherent Fault-tolerance |
Mahout looks like a more polished product, especially as it relies on Hadoop for scalability and distribution. |
Graphlab excells since it is built ground up for iterative algorithms such as those used in collaborative filtering. |
The mahout framework comes in two approaches: Online where recommendations are computed on demand, typically on smaller datasets. Offline which utilise Apache Hadoop to achieve scalability. |
Graphlab lacks a production-ready distribution framework. |
For 50000 items, you need to have N machines with at least 28 GiB of memory for each, where N is the number of Hadoop nodes and hence 28 GiB of memory becomes an issue. |
Costly performance penalties since runtime of each phase is decided by slowest machine. |