# Sparse Data Set

(Redirected from sparse data)

A sparse data set is a dataset with many sparse records.

**Context:**- It can range from being a High-Dimensional Sparse Dataset to being a Low-Dimensional Sparse Dataset.
- It can range from being a Small Sparse Dataset to being a Large Sparse Dataset.
- It can be represented by a Sparse Data Structure (such as a hash table).
- It can be analyzed by a Sparse Data Algorithm.
- It can (typically) be a Wide Dataset.

**Example(s):****Counter-Example(s):**- Dense Data Set, such as a dense matrix.

**See:**Sparse.

## References

### 2005

- (Zhao & Grishman, 2005) ⇒ Shubin Zhao, Ralph Grishman. (2005). “Extracting Relations with Integrated Information Using Kernel Methods.” In: Proceedings of ACL Conference (ACL 2005).
- QUOTE: ... Some of the kernels are extended to generate high order features. We think a discriminative classifier trained with all the available syntactic features should do better on the sparse data.

### 1999

- (Zaiane, 1999) ⇒ Osmar Zaiane. (1999). “Glossary of Data Mining Terms." University of Alberta, Computing Science CMPUT-690: Principles of Knowledge Discovery in Databases.
- QUOTE: Sparse: A multi-dimensional data set is sparse if a relatively high percentage of the possible combinations (intersections) of the members from the data set's dimensions contain missing data. The total possible number of intersections can be computed by multiplying together the number of members in each dimension. Data sets containing one percent, .01 percent, or even smaller percentages of the possible data exist and are quite common. The oppasite of a sprace cube is a dense cube.

### 1999b

- (Wagstaff & Cardie, 1999) ⇒ Kiri Wagstaff, Claire Cardie. (1999). “Noun Phrase Coreference as Clustering.” In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP 1999).
- QUOTE: Despite a large corpus (150 million words), their approach suffers from sparse data problems, but works well when enough relevant data is available.

### 1980

- (Jelinek & Mercer, 1980) ⇒ F. Jelinek, and R. Mercer. (1980). “Interpolated Estimation of Markov source parameters from sparse data.” In: E. Gelsema and L. Kanal (Eds.), Pattern recognition inpractice. North-Holland publishing company