RCFile

From GM-RKB
Jump to navigation Jump to search

See: Data File, File Format, TSV File, Hive RCFile.



References

2013

  • http://en.wikipedia.org/wiki/RCFile
    • RCFile (Record Columnar File) is a data placement structure that determines how to store relational tables on computer clusters. It is designed for data warehouse systems using the MapReduce framework. The RCFile structure is a systematic combination of multiple components including data storage format, data compression approach, and optimization techniques for data reading. It is able to meet all the four requirements of data placement: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) a strong adaptivity to dynamic data access patterns.

      RCFile is a result of basic research with collaborative efforts from Facebook, Ohio State University, and Institute of Computing Technology, Chinese Academy of Sciences. A research paper entitled “RCFile: a Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse systems”[1] was published and presented in ICDE’ 11. The data placement structure and its implementation presented in the paper have been widely adopted in the open source community, big data analytics industries, and application users. See the section of Impacts.

  1. Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, Zhiwei Xu, "RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems", Proceedings of the IEEE International Conference on Data Engineering (ICDE), 2011. http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/abs11-4.html