Distributed Machine Learning System: Difference between revisions

Revision as of 18:52, 1 August 2022

Context:
- It can range from being a standard Distributed ML System, to being a Distributed Reinforcement Learning System, to being a Distributed Deep Learning System.
Example(s):
- MLlib,
- MLbase,
- MLI,
- Vowpal Wabbit Open Source Project,
- GraphLab,
- GraphX,
- Tensorflow,
- MXNet,
- Ray.
- …
Counter-Example(s):
- Apache Spark-Kafka Oryx 2,
- Caffe,
- Theano,
- Torch7.
See: Distributed Algorithms, Resilient Distributed Datasets (RDDS), Distributed Graph System, Iterative Machine Learning System.

(Nishihara & Moritz, 2018) ⇒ Robert Nishihara, and Philipp Moritz (Jan 9, 2018). "Ray: A Distributed System for AI" Retrieved on 2019-04-14
- QUOTE: One of Ray’s goals is to enable practitioners to turn a prototype algorithm that runs on a laptop into a high-performance distributed application that runs efficiently on a cluster (or on a single multi-core machine) with relatively few additional lines of code. Such a framework should include the performance benefits of a hand-optimized system without requiring the user to reason about scheduling, data transfers, and machine failures.
  (...) There are two main ways of using Ray: through its lower-level APIs and higher-level libraries. The higher-level libraries are built on top of the lower-level APIs. Currently these include Ray RLlib, a scalable reinforcement learning library and Ray.tune, an efficient distributed hyperparameter search library.

(Agarwal et al., 2014) ⇒ Alekh Agarwal, Olivier Chapelle, Miroslav Dudík, and John Langford. (2014). “A Reliable Effective Terascale Linear Learning System.” In: The Journal of Machine Learning Research, 15(1).
- QUOTE: Perhaps the simplest strategy when the number of examples n is too large for a given learning algorithm is to reduce the data set size by subsampling. However, this strategy only works if the problem is simple enough or the number of parameters is very small. The setting of interest here is when a large number of examples is really needed to learn a good model. Distributed algorithms are a natural choice for such scenarios.

@@ Line 1: / Line 1: @@
 A [[Distributed Machine Learning System]] is [[Distributed Computing System]] for implementing and developing [[machine learning algorithm]]s.
 * <B>Context:</B>
 ** It can range from being a standard [[Distributed ML System]], to being a [[Distributed Reinforcement Learning System]], to being a [[Distributed Deep Learning System]].
 * <B>Example(s):</B>
 ** [[MLlib]],
@@ Line 34: / Line 34: @@
 === 2014 ===
 * ([[2014_AReliableEffectiveTerascaleLine|Agarwal et al., 2014]]) ⇒ [[Alekh Agarwal]], [[Olivier Chapelle]], [[Miroslav Dudík]], and [[John Langford]]. ([[2014]]). &ldquo;[http://www.jmlr.org/papers/volume15/agarwal14a/agarwal14a.pdf A Reliable Effective Terascale Linear Learning System].&rdquo; In: The Journal of Machine Learning Research, 15(1).
 ** QUOTE: Perhaps the simplest [[strategy]] when the number of [[example]]s n is too large for a given [[learning algorithm]] is to reduce the data set size by [[subsampling]]. However, this [[strategy]] only works if the [[problem]] is simple enough or the number of [[parameter]]s is very small. The [[setting]] of interest here is when a large number of [[example]]s is really needed to [[learn]] a [[good model]]. [[Distributed algorithm]]s are a natural choice for such scenarios.