pyspark.sql Module: Difference between revisions
Jump to navigation
Jump to search
m (Text replacement - " [[" to " [[") |
m (Text replacement - "ions]] " to "ion]]s ") |
||
Line 3: | Line 3: | ||
** It can contain: | ** It can contain: | ||
*** a [[pyspark.sql.types]] module | *** a [[pyspark.sql.types]] module | ||
*** a [[pyspark.sql. | *** a [[pyspark.sql.function]]s module | ||
* <B>Counter-Example(s):</B> | * <B>Counter-Example(s):</B> | ||
** [[pyspark.streaming]]. | ** [[pyspark.streaming]]. | ||
Line 34: | Line 34: | ||
[[pyspark.sql.Row]] A row of data in a DataFrame. | [[pyspark.sql.Row]] A row of data in a DataFrame. | ||
[[pyspark.sql.GroupedData]] Aggregation methods, returned by DataFrame.groupBy(). | [[pyspark.sql.GroupedData]] Aggregation methods, returned by DataFrame.groupBy(). | ||
[[pyspark.sql. | [[pyspark.sql.DataFrameNaFunction]]s Methods for handling missing data (null values). | ||
[[pyspark.sql. | [[pyspark.sql.DataFrameStatFunction]]s Methods for statistics functionality. | ||
[[pyspark.sql. | [[pyspark.sql.function]]s List of built-in functions available for DataFrame. | ||
[[pyspark.sql.types]] List of data types available. | [[pyspark.sql.types]] List of data types available. | ||
[[pyspark.sql.Window]] For working with window functions. | [[pyspark.sql.Window]] For working with window functions. |
Latest revision as of 07:32, 22 August 2024
A pyspark.sql Module is a Spark SQL PySpark module.
- Context:
- It can contain:
- a pyspark.sql.types module
- a pyspark.sql.functions module
- It can contain:
- Counter-Example(s):
- See: pyspark, s3a, SparkContext.
References
2017
- https://spark.apache.org/docs/2.2.0/sql-programming-guide.html
- QUOTE: All data types of Spark SQL are located in the package of pyspark.sql.types. You can access them by doing
from pyspark.sql.types import *
- QUOTE: All data types of Spark SQL are located in the package of pyspark.sql.types. You can access them by doing
2017
- http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html
- QUOTE:
class pyspark.sql.SQLContext(sparkContext, sqlContext=None)
Main entry point for Spark SQL functionality.- A SQLContext can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files.
- Parameters:
- sparkContext – The SparkContext backing this SQLContext.
- sqlContext – An optional JVM Scala SQLContext. If set, we do not instantiate a new SQLContext in the JVM, instead we make all calls to this object.
- QUOTE:
2017
pyspark.sql.SparkSession Main entry point for DataFrame and SQL functionality. pyspark.sql.DataFrame A distributed collection of data grouped into named columns. pyspark.sql.Column A column expression in a DataFrame. pyspark.sql.Row A row of data in a DataFrame. pyspark.sql.GroupedData Aggregation methods, returned by DataFrame.groupBy(). pyspark.sql.DataFrameNaFunctions Methods for handling missing data (null values). pyspark.sql.DataFrameStatFunctions Methods for statistics functionality. pyspark.sql.functions List of built-in functions available for DataFrame. pyspark.sql.types List of data types available. pyspark.sql.Window For working with window functions.