PySpark API

A PySpark API is a Spark API for Python code.

Context:
- It can contain, PySpark Classes, such as:
  - pyspark.SparkContext: Main entry point for Spark functionality.


Counter-Example(s):
Spark Scala API.
Spark Java API.
See: Spark SQL, Spark Library.

References2016

https://spark.apache.org/docs/0.9.0/python-programming-guide.html
QUOTE: The Spark Python API (PySpark) exposes the Spark programming model to Python. To learn the basics of Spark, we recommend reading through the Scala programming guide first; it should be easy to follow even if you don’t know Scala. This guide will show how to use the Spark features described there in Python.  There are a few key differences between the Python and Scala APIs:
Python is dynamically typed, so RDDs can hold objects of multiple types.
PySpark does not yet support a few API calls, such as lookup and non-text input files, though these will be added in future releases.

http://spark.apache.org/docs/latest/api/python/
QUOTE: Core classes:
pyspark.SparkContext: Main entry point for Spark functionality.

pyspark.RDD: A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.

pyspark.streaming.StreamingContext: Main entry point for Spark Streaming functionality.

pyspark.streaming.DStream: A Discretized Stream (DStream), the basic abstraction in Spark Streaming.

pyspark.sql.SQLContext: Main entry point for DataFrame and SQL functionality.

pyspark.sql.DataFrame: A distributed collection of data grouped into named columns.

http://stackoverflow.com/a/37084862
QUOTE: As of Spark 1.0, you should launch pyspark applications using spark-submit. 
 While pyspark will launch the interactive shell, spark-submit allows you to easily launch a spark job on various cluster managers.

2015

http://spark.apache.org/docs/latest/api/python/index.html
pyspark package
Subpackages
Contents
pyspark.sql module
Module Context
pyspark.sql.types module
pyspark.sql.functions module
pyspark.streaming module
Module contents
pyspark.streaming.kafka module
pyspark.ml package
Module Context
pyspark.ml.feature module
pyspark.ml.classification module
pyspark.mllib package
pyspark.mllib.classification module
pyspark.mllib.clustering module
pyspark.mllib.feature module
pyspark.mllib.linalg module
pyspark.mllib.random module
pyspark.mllib.recommendation module
pyspark.mllib.regression module
pyspark.mllib.stat module
pyspark.mllib.tree module
pyspark.mllib.util module

PySpark API

References

2016

2015

Navigation menu

Search