Spark SQL Module: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
No edit summary
(Redirected page to SparkSQL Module)
 
Line 1: Line 1:
A [[Spark SQL Module]] is a [[Spark module]] that is a [[SQL Module]].
#REDIRECT [[SparkSQL Module]]
* <B>Context:</B>
** It can be access by a [[pyspark.sql Module]], ...
** It can be based on a [[Shark System]].
* <B>Example(s):</B>
** [[Spark SQL v2.2]]
* <B>Counter-Example(s):</B>
** [[Spark DataFrame API]].
** [[Spark Datasets API]].
** [[Hive SQL]].
** [[SAS SQL]].
** [[Spark MLlib]].
** [[Presto SQL DBMS]].
* <B>See:</B> [[Spark API]], [[Spark Shark]], [[SchemaRDD]], [[Resilient Distributed Dataset (RDD)]].
----
----
==References ==
 
=== 2017 ===
*  https://github.com/apache/spark/tree/branch-2.2/sql
** QUOTE: [[SparkSQL|This module]] provides support for executing [[relational queri]]es expressed in either [[SQL]] or the [[DataFrame/Dataset API]].  <P> [[Spark SQL]] is broken up into four subprojects:
*** Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions.
*** Execution (sql/core) - A query planner / execution engine for translating Catalyst's logical query plans into Spark RDDs. This component also includes a new public interface, SQLContext, that allows users to execute SQL or LINQ statements against existing RDDs and Parquet files.
*** Hive Support (sql/hive) - Includes an extension of SQLContext called HiveContext that allows users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs.
*** HiveServer and CLI support (sql/hive-thriftserver) - Includes support for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server.
 
=== 2017 ===
* https://cdn2.hubspot.net/hubfs/488249/assets/atscale-data-sheet.pdf
** QUOTE: ... [[AtScale]] works out-of-the-box with the leading [[SQL-on-Hadoop engine]]s, such as [[Impala DBMS|Impala]], [[SparkSQL]], or [[Hive DBMS|Hive]], and allows them to function as an [[analytics engine]].
 
=== 2016 ===
* http://spark.apache.org/docs/latest/sql-programming-guide.html#sql
** QUOTE: One use of [[Spark SQL]] is to execute [[SQL queri]]es written using either a basic [[SQL syntax]] or [[HiveQL]]. [[Spark SQL]] can also be used to read data from an existing Hive installation. For more on how to configure this feature, please refer to the Hive Tables section. When running SQL from within another programming language the results will be returned as a [[Spark DataFrame|DataFrame]]. You can also [[interact with]]  the SQL interface using the command-line or over JDBC/ODBC.
 
=== 2015 ===
* http://spark.apache.org/sql/
** QUOTE: [[Spark SQL]] lets you [[query structured data]] as a [[distributed dataset (RDD)]] in [[Spark]], with integrated [[Spark API|APIs]] in [[pyspark.sql|Python]], [[org.apache.spark.sql.package|Scala]] and [[Java Spark SQL|Java]]. This tight integration makes it easy to run [[SQL queri]]es alongside [[complex analytic algorithm]]s.
* ([[2015_SparkSQLRelationalDataProcessin|Armbrust et al., 2015]]) &rArr; [[Michael Armbrust]], [[Reynold S. Xin]], [[Cheng Lian]], [[Yin Huai]], [[Davies Liu]], [[Joseph K. Bradley]], [[Xiangrui Meng]], [[Tomer Kaftan]], [[Michael J. Franklin]], [[Ali Ghodsi]], and [[Matei Zaharia]]. ([[2015]]). “[https://www.cs.berkeley.edu/~alig/papers/sparksql.pdf Spark SQL: Relational Data Processing in Spark]." In: [[Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data]]. ISBN:978-1-4503-2758-9 [http://dx.doi.org/10.1145/2723372.2742797 doi:10.1145/2723372.2742797]
** QUOTE: [[Spark SQL Module|Spark SQL]] is a new [[module in Apache Spark]] that integrates [[relational processing]] with [[Spark's functional programming API]]. </s> Built on our [[experience]] with [[Shark]],
 
=== 2013 ===
* ([[2013_SharkSQLandRichAnalyticsatScale|Xin et al., 2013]]) &rArr; [[Reynold S. Xin]], [[Josh Rosen]], [[Matei Zaharia]], [[Michael J. Franklin]], [[Scott Shenker]], and [[Ion Stoica]]. ([[2013]]). “[https://www.icsi.berkeley.edu/pubs/networking/ICSI_sharksql13.pdf Shark: SQL and Rich Analytics at Scale]." In: [[Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data]]. ISBN:978-1-4503-2037-5 [http://dx.doi.org/10.1145/2463676.2465288 doi:10.1145/2463676.2465288]
 
----
__NOTOC__
[[Category:Concept]]

Latest revision as of 19:41, 23 August 2017

Redirect to: