What is the difference between Cloudera Oryx and Apache Mahout ?

Differences between Cloudera Oryx and Apache Mahout

There are 3 broad things an operational ML system needs to do eventually
- Build models at scale, offline
- Update models in near real time
- Query models in real time
Most of the tools like Mahout or MLLib do building models at scale only.

Oryx tries to do all 3, and is not doing building model.
Therefore it is really intended as a complement to any Hadoop-based model build system.
As a result it is MapReduce based for model building and implemented algorithms instead of using Mahout to improve on perceived problems.
The project which is open source, is more designed as 3 complete apps rather than a platform for extension.
It only implements
- ALS for recommendation
- Kmeans for clustering
- Random decision forests for classification and regression
The major difference is fewer algorithms but complete apps including incremental update and serving. It is not the algorithms that are really the difference since Oryx is not a new library.
The next version is built on Spark and Kafka then becomes more of generic lambda architecture for ML that happens to have entire apps too.
It is kind of Summing bird for ML on Spark. It has no algorithms implementations at all, not now. Therefore it is even more different from Mahout or MLLib.

Categorized in:

Tagged in: