Apache HBase Database Framework

From GM-RKB
Jump to navigation Jump to search

An Apache HBase Database Framework is a scaleout table store which can support a very high rate of row-level updates over very large databases.



References

2016

  • (Wikipedia, 2016) ⇒ http://wikipedia.org/wiki/Apache_HBase Retrieved:2016-3-2.
    • HBase is an open source, non-relational, distributed database modeled after Google's BigTable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection).

      HBase features compression, in-memory operation, and Bloom filters on a per-column basis as outlined in the original BigTable paper. [1] Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through REST, Avro or Thrift gateway APIs. Hbase is a column-oriented key -value data store and has idolized widely because of its lineage with Hadoop and HDFS. HBase runs on top of HDFS and well-suited for faster read and write operations on large datasets with high throughput and low input/output latency. HBase is not a direct replacement for a classic SQL database, however Apache Phoenix project provides a SQL layer for Hbase as well as JDBC driver that can be integrated with various analytics and business intelligence applications. The Apache Trafodion project provides a SQL query engine with ODBC and JDBC drivers and distributed ACID transaction protection across multiple statements, tables and rows that uses HBase as a storage engine. Hbase is now serving several data-driven websites, [2] including Facebook's Messaging Platform.[3] [4] Unlike relational and traditional databases, HBase does not support SQL scripting instead written in Java employing similarity with MapReDuce application.

      In the parlance of Eric Brewer’s CAP Theorem, HBase is a CP type system.

  1. Chang, et al. (2006). Bigtable: A Distributed Storage System for Structured Data
  2. Powered By HBase
  3. The Underlying Technology of Messages
  4. Facebook: Why our 'next-gen' comms ditched MySQL Retrieved: 17 December 2010

2012

  • http://hbase.apache.org/
    • HBase is the Hadoop database. Think of it as a distributed, scalable, big data store.

      Use HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

      Features:

      • Linear and modular scalability.
      • Strictly consistent reads and writes.
      • Automatic and configurable sharding of tables
      • Automatic failover support between RegionServers.
      • Convenient base classes for backing Hadoop MapReduce jobs with HBase tables.
      • Easy to use Java API for client access.
      • Block cache and Bloom Filters for real-time queries.
      • Query predicate push down via server side Filters
      • Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
      • Extensible jruby-based (JIRB) shell
      • Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX