pyspark.sql Module: Difference between revisions

Latest revision as of 07:32, 22 August 2024

A pyspark.sql Module is a Spark SQL PySpark module.

Context:
- It can contain:
  - a pyspark.sql.types module
  - a pyspark.sql.functions module
Counter-Example(s):
See: pyspark, s3a, SparkContext.

References

2017

https://spark.apache.org/docs/2.2.0/sql-programming-guide.html
- QUOTE: All data types of Spark SQL are located in the package of pyspark.sql.types. You can access them by doing
  from pyspark.sql.types import *

2017

http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html
- QUOTE:
  class pyspark.sql.SQLContext(sparkContext, sqlContext=None)
  Main entry point for Spark SQL functionality.
  - A SQLContext can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files.
- Parameters:
  - sparkContext – The SparkContext backing this SQLContext.
  - sqlContext – An optional JVM Scala SQLContext. If set, we do not instantiate a new SQLContext in the JVM, instead we make all calls to this object.

2017

http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html

pyspark.sql.SparkSession Main entry point for DataFrame and SQL functionality.
pyspark.sql.DataFrame A distributed collection of data grouped into named columns.
pyspark.sql.Column A column expression in a DataFrame.
pyspark.sql.Row A row of data in a DataFrame.
pyspark.sql.GroupedData Aggregation methods, returned by DataFrame.groupBy().
pyspark.sql.DataFrameNaFunctions Methods for handling missing data (null values).
pyspark.sql.DataFrameStatFunctions Methods for statistics functionality.
pyspark.sql.functions List of built-in functions available for DataFrame.
pyspark.sql.types List of data types available.
pyspark.sql.Window For working with window functions.

@@ Line 3: / Line 3: @@
 ** It can contain:
 *** a [[pyspark.sql.types]] module
-*** a [[pyspark.sql.functions]] module
+*** a [[pyspark.sql.function]]s module
 * <B>Counter-Example(s):</B>
 ** [[pyspark.streaming]].
@@ Line 34: / Line 34: @@
   [[pyspark.sql.Row]] A row of data in a DataFrame.
   [[pyspark.sql.GroupedData]] Aggregation methods, returned by DataFrame.groupBy().
-  [[pyspark.sql.DataFrameNaFunctions]] Methods for handling missing data (null values).
+  [[pyspark.sql.DataFrameNaFunction]]s Methods for handling missing data (null values).
-  [[pyspark.sql.DataFrameStatFunctions]] Methods for statistics functionality.
+  [[pyspark.sql.DataFrameStatFunction]]s Methods for statistics functionality.
-  [[pyspark.sql.functions]] List of built-in functions available for DataFrame.
+  [[pyspark.sql.function]]s List of built-in functions available for DataFrame.
   [[pyspark.sql.types]] List of data types available.
   [[pyspark.sql.Window]] For working with window functions.

pyspark.sql Module: Difference between revisions

Latest revision as of 07:32, 22 August 2024

References

2017

2017

2017

Navigation menu

Search