PySpark API

From GM-RKB
(Redirected from PySpark)
Jump to navigation Jump to search

A PySpark API is a Spark API for Python code.



References

2017

  • http://datanami.com/2017/05/18/committers-talk-hadoop-3-apache-big-data/
    • QUOTE: While Spark gives the customer all kinds of great capabilities, the Python implementation lacks the code portability that exists when working with Spark through Java or Scala. …

      … By deploying PySpark in a Docker container under YARN, a developer can get the exact PySpark environment they want without requiring administrators to get involved with detailed configurations. It all gets bundled up in a Docker container, and YARN runs it like any other Hadoop job on the cluster.

2016

2016b

2016b

2016c