Horizontally Partitioned (Sharded) Database

From GM-RKB
Jump to navigation Jump to search

A Horizontally Partitioned (Sharded) Database is a partitioned database composed of with horizontal structured database shards.



References

2019

  • https://profitpt.com/2018/01/09/supply_chain_optimization_database_sharding_partitioning/
    • Partitioning is a general term used to describe the breaking up of your logical data elements into multiple entities typically for the purpose of performance, availability, or maintainability.

       Sharding is the equivalent of “horizontal partitioning.” When you shard a database, you create replications of the table schema, then divide what data is stored in each shard based on a shard key. For example, you might shard a material database using Material Id as a shard key – you could store ranges 0-20000 in one shard and 20001-40000 in a different shard. When choosing a shard key, you want to consider data-access patterns, query biased and storage space considerations to ensure you are distributing load and space across shards evenly.

      “Vertical partitioning” involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized. For example, a customer might only have one credit account number on file, yet I might choose to put the credit account information into a separate table with a Customer Id reference so that I have the flexibility to move that information into a separate database or different security context, etc. Vertical Partitioning can also be used to help reduce IO. Assume we have a table with a lot of data such as a material master. Only about 5% of the columns are normally queried such as material number and description. Assuming the frequent queries will lead to clustered index scans, we want to minimize the number of IO operations, so we want to make the row as small as possible. Infrequently queried columns such as material characteristics can be vertically partitioned. A partition is a physically separate file that comprises a subset of rows of a logical file which occupies the same CPU, memory and storage node as its peer partitions.

      A shard is a physical computer node comprised of a CPU, memory, and storage. A shard’s schema (and integrity constraints) may be replicated across as many shards as needed. Shards may contain non-partitioned and partitioned tables.

      By using shards and partitions together we now effectively have two keys which we can use to logically chunk out the data. Choosing those keys depends on the query biases of the primary applications reading and writing data.

2015

2015

  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Shard_(database_architecture)#Disadvantages_of_sharding Retrieved:2015-10-2.
    • Sharding a database table before it has been optimized locally causes premature complexity. Sharding should be used only when all other options for optimization are inadequate. The introduced complexity of database sharding causes the following potential problems:
      • Increased complexity of SQL - Increased bugs because the developers have to write more complicated SQL to handle sharding logic.
      • Sharding introduces complexity - The sharding software that partitions, balances, coordinates, and ensures integrity can fail.
      • Single point of failure - Corruption of one shard due to network/hardware/systems problems causes failure of the entire table.
      • Failover servers more complex - Failover servers must themselves have copies of the fleets of database shards.
      • Backups more complex - Database backups of the individual shards must be coordinated with the backups of the other shards.
      • Operational complexity added - Adding/removing indexes, adding/deleting columns, modifying the schema becomes much more difficult.
    • These historical complications of do-it-yourself sharding are now being addressed by independent software vendors who provide autosharding solutions.




Cite error: <ref> tags exist for a group named "notes", but no corresponding <references group="notes"/> tag was found