Spark Datasets API

A Spark Datasets API is a Spark data structure such that ...

Example(s):
- …
Counter-Example(s):
See: Spark Datasets API, Data Frame.

References

2016

http://www.kdnuggets.com/2016/02/apache-spark-rdd-dataframe-dataset.html/2

 val sc = new SparkContext(conf)
 val sqlContext = new SQLContext(sc)
 import sqlContext.implicits._
 val sampleData: Seq[ScalaPerson] = ScalaData.sampleData()
 val dataset = sqlContext.createDataset(sampleData)

 dataset.filter(_.age < 21);

2015

http://spark.apache.org/docs/latest/sql-programming-guide.html#datasets
- QUOTE: A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL’s optimized execution engine. A Dataset can be constructed from JVM objects and then manipulated using functional transformations (map, flatMap, filter, etc.).
  The unified Dataset API can be used both in Scala and Java. Python does not yet have support for the Dataset API, but due to its dynamic nature many of the benefits are already available (i.e. you can access the field of a row by name naturally row.columnName). Full python support will be added in a future release.

Spark Datasets API

References

2016

2015

Navigation menu

Search