# High-Dimensionality Dataset

(Redirected from high-dimensional dataset)

A High-Dimensionality Dataset is a structured dataset whose data record attributes form a large set.

**Context:**- It can range from being a Sparse High-Dimensional Dataset to being a Dense High-Dimensional Dataset.
- It can be the Input to: High-Dimensionality Clustering, Dimensionality Reduction, ...
- It can represent a subset of a High-Dimensional Space.

**Example(s):**- a High-Dimensionality Learning Dataset.
- a vectorized image dataset.
- a High-Dimensional Sensory Input.

**Counter-Example(s):****See:**Index Data Structure, High-Dimensionality Matrix.

## References

### 1999

- (Agrawal et al., 1999) ⇒ Rakesh Agrawal, Johannes Ernst Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan. (1999). “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications." US Patent 6,003,029,
- QUOTE: Emerging data mining applications place special requirements on clustering techniques, such as the ability to handle
**high dimensionality**, assimilation of cluster descriptions by users, description minimation, and scalability and usability. Regarding high dimensionality of data clustering, an object typically has**dozens of attributes**in which the domains of the attributes are large. Clusters formed in a high-dimensional data space are not likely to be meaningful clusters because the expected average density of points anywhere in the high-dimensional data space is low. The requirement for**high dimensionality**in a data mining application is conventionally addressed by requiring a user to specify the subspace for cluster analysis.

- QUOTE: Emerging data mining applications place special requirements on clustering techniques, such as the ability to handle

### 1996

- (Berchtold et al., 1996) ⇒ Stefan Berchtold, Daniel A. Keim, and Hans-Peter Kriegel. (1996). “The X-tree: An Index Structure for High-Dimensional Data.” In: Proceedings of VLDB Conference (VLDB 1996).
- QUOTE: In many applications, indexing of
**high-dimensional data**has become increasingly important. In multimedia databases, for example, the multimedia objects are usually mapped to feature vectors in some high-dimensional space and queries are processed against a database of those feature vectors [Fal 94]. Similar approaches are taken in many other areas including CAD [MG 93], molecular biology (for the docking of molecules) [SBK 92], string matching and sequence alignment [AGMM 90], etc. Examples of feature vectors are**color histograms**[SH 94], shape descriptors [Jag 91, MG 95],**Fourier vectors**[WW 80], text descriptors [Kuk 92], etc. In some applications, the mapping process does not yield point objects, but extended spatial objects in high-dimensional space [MN 95]. In many of the mentioned applications, the databases are very large and consist of**millions of data objects**with several tens to a few hundreds of dimensions. For querying these databases, it is essential to use appropriate indexing techniques which provide an efficient access to high-dimensional data. The goal of this paper is to demonstrate the limits of currently available index structures, and present a new index structure which considerably improves the performance in indexing high-dimensional data.

- QUOTE: In many applications, indexing of