Data Redistribution Across Partitions Operation
		
		
		
		
		
		Jump to navigation
		Jump to search
		
		
	
A Data Redistribution Across Partitions Operation is a distributed data structure operation that ...
- Context:
- …
 
- Example(s):
- …
 
- Counter-Example(s):
- See: Read-Only Distributed Data Structure, Data Sharding.
References
2018
- https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-partitions.html
- QUOTE: Depending on how you look at Spark (programmer, devop, admin), an RDD is about the content (developer’s and data scientist’s perspective) or how it gets spread out over a cluster (performance), i.e. how many partitions an RDD represents.         A partition (aka split) is a logical chunk of a large distributed data set. Spark manages data using partitions that helps parallelize distributed data processing with minimal network traffic for sending data between executors. 
 
- QUOTE: Depending on how you look at Spark (programmer, devop, admin), an RDD is about the content (developer’s and data scientist’s perspective) or how it gets spread out over a cluster (performance), i.e. how many partitions an RDD represents.