Hive RCFile

From GM-RKB
Jump to navigation Jump to search

See: RCFile, Hive System.



References

2013

  • http://hive.apache.org/docs/r0.5.0/api/org/apache/hadoop/hive/ql/io/RCFile.html
    • RCFiles, short of Record Columnar File, are flat files consisting of binary key/value pairs, which shares much similarity with SequenceFile. RCFile stores columns of a table in a record columnar way. It first partitions rows horizontally into row splits. and then it vertically partitions each row split in a columnar way. RCFile first stores the meta data of a row split, as the key part of a record, and all the data of a row split as the value part. When writing, RCFile. Writer first holds records' value bytes in memory, and determines a row split if the raw bytes size of buffered records overflow a given parameterWriter.columnsBufferSize, which can be set like: conf.setInt(COLUMNS_BUFFER_SIZE_CONF_STR, 4 * 1024 * 1024) .

      RCFile provides RCFile.Writer, RCFile. Reader and classes for writing, reading respectively.

      RCFile stores columns of a table in a record columnar way. It first partitions rows horizontally into row splits. and then it vertically partitions each row split in a columnar way. RCFile first stores the meta data of a row split, as the key part of a record, and all the data of a row split as the value part.

      RCFile compresses values in a more fine-grained manner then record level compression. However, It currently does not support compress the key part yet. The actual compression algorithm used to compress key and/or values can be specified by using the appropriate CompressionCodec.

      The RCFile. Reader is used to read and explain the bytes of RCFile.