k-d Tree Data Structure

From GM-RKB
Jump to navigation Jump to search

A k-d Tree Data Structure is a tree-based data structure for organizing points in a k-dimensional space.



References

2018

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/K-d_tree#Informal_description Retrieved:2018-8-5.
    • The k-d tree is a binary tree in which every node is a k-dimensional point. Every non-leaf node can be thought of as implicitly generating a splitting hyperplane that divides the space into two parts, known as half-spaces. Points to the left of this hyperplane are represented by the left subtree of that node and points to the right of the hyperplane are represented by the right subtree. The hyperplane direction is chosen in the following way: every node in the tree is associated with one of the k-dimensions, with the hyperplane perpendicular to that dimension's axis. So, for example, if for a particular split the "x" axis is chosen, all points in the subtree with a smaller "x" value than the node will appear in the left subtree and all points with larger "x" value will be in the right subtree. In such a case, the hyperplane would be set by the x-value of the point, and its normal would be the unit x-axis.

2014

1975

  • (Bentley, 1975) ⇒ Jon Louis Bentley (1975). "Multidimensional binary search trees used for associative searching". Communications of the ACM, 18(9), 509-517. DOI:10.1145/361002.361007.
    • QUOTE: If a file is represented as a k-d tree, then each record in the file is stored as a node in the tree. In addition to the k keys which comprise the record, each node contains two pointers, which are either null or point to another node in the k-d tree. (Note that each pointer can be considered as specifying a subtree.) Associated with each node, though not necessarily stored as a field, is a discriminator, which is an integer between 0 and [math]\displaystyle{ k -1 }[/math], inclusive. Let us define a notation for these items: the k keys of node P will be called [math]\displaystyle{ K_o(P),\;\cdots, K_{k-I}(P) }[/math], the pointers will be LOSON(P) and HISON(P), and the discriminator will be DISC(P). The defining order imposed by a k-d tree is this: For any node P in a k-d tree, let j be DISC(P), then for any node Q in LOSON(P), it is true that [math]\displaystyle{ K_j(Q) \lt K_j(P) }[/math]; likewise, for any node R in HISON(P), it is true that [math]\displaystyle{ K_j(R) \gt K_j(P) }[/math]. (This statement does not take into account the possibility of equality of keys, which will be discussed shortly.)

      All nodes on any given level of the tree have the same discriminator. The root node has discriminator 0, its two sons have discriminator 1, and so on to the k-th level on which the discriminator is k - 1 ; the (k + 1)-th level has discriminator 0, and the cycle repeats. In general, NEXTDISC is a function defined as

      [math]\displaystyle{ NEXTDISC(i) = (i + 1) \mod k }[/math],

      and NEXTDISC(DISC(P)) = DISC(LOSON(P)), and likewise for HISON(P) (if they are non-null). Figure 1 gives an example of records in 2-space stored as nodes in a 2-d tree.

Bentley1975 Fig1a.png Bentley1975 Fig1b.png
Fig.1: Records in 2-space stored as nodes in a 2-d tree (boxes represent range of subtree). Planar graph representation of the same 2-d tree (LOSON's are expressed by left branches, HISON's by right branches, and null sons by boxes.
(...) First, k-d trees are intended primarily for use in a 1-level store, but using secondary storage (such as disk or drum memory) for overflow might be acceptable if there were few additions and deletions from the file. This factor is becoming less of a restriction as memories rapidly decrease in cost and increase in size. Second, it is necessary for k-d trees to have some minimum number of nodes before they become useful. For example, if the deepest node in a 10-d tree is on the 9th level, one of the keys will never have been used as a discriminator as the tree was being built. A good rule of thumb might be to use k-d trees only if [math]\displaystyle{ n \gt 2^{2k} }[/math].