# Minimum Description Length Principle

A minimum description length principle is an algorithmic priciple that the best hypothesis for a data set is the one that leads to the largest data compression performance.

**AKA:**Minimum Description Length, MDL Principle, MDL.**Context:**.- It was introduced in (Rissanen, 1978).
- (Grünwald, 1998).
- We seek to minimize the sum of the length, in bits, of an effective description of the model and the length, in bits, of an effective description of the data when encoded with the help of the model. (Sewell, 2006)

**See:**BIC, Kolmogorov Complexity, Information Theory, Learning Theory; Minimum Message Length

## References

### 2013

- http://en.wikipedia.org/wiki/Minimum_description_length
- The
**minimum description length (MDL) principle**is a formalization of Occam's Razor in which the best hypothesis for a given set of data is the one that leads to the best compression of the data. MDL was introduced by Jorma Rissanen in 1978. It is an important concept in information theory and learning theory.

- The

- http://en.wikipedia.org/wiki/Minimum_description_length#Overview
- Any set of data can be represented by a string of symbols from a finite (say, binary) alphabet.
[The MDL Principle] is based on the following insight: any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally." (Grünwald, 1998)

To select the hypothesis that captures the most regularity in the data, scientists look for the hypothesis with which the best compression can be achieved. In order to do this, a code is fixed to compress the data, most generally with a (Turing-complete) computer language. A program to output the data is written in that language; thus the program effectively represents the data. The length of the shortest program that outputs the data is called the Kolmogorov complexity of the data. This is the central idea of Ray Solomonoff's idealized theory of inductive inference.

- Any set of data can be represented by a string of symbols from a finite (say, binary) alphabet.

### 2011

- (Rissanen, 2011) ⇒ Jorma Rissanen. (2011). “Minimum Description Length Principle.” In: (Sammut & Webb, 2011) p.666

### 2008

- http://www.modelselection.org/mdl/
- QUOTE: Minimum description length (MDL) (Rissanen 1978) is a technique from algorithmic information theory which dictates that the best hypothesis for a given set of data is the one that leads to the largest compression of the data. We seek to minimize the sum of the length, in bits, of an effective description of the model and the length, in bits, of an effective description of the data when encoded with the help of the model. Sewell (2006)
- QUOTE: "Minimum description length (MDL) (Rissanen 1978) uses an information theoretic measure. Kolmogorov complexity of a dataset is defined as the shortest description of the data. If the data is simple, it has a short complexity; for example, if it is a sequence of “0”s, we can just write “0” and the length of the sequence. If the data is completely random, then we cannot have any description of the data shorter than the data itself. If a model is appropriate for the data, then it has a good fit to the data, and instead of the data, we can send/store the model description. Out of all the models that describe the data, we want to have the simplest model so that it lends itself to the shortest description. So we again have a trade-off between how simple the model is and how well it explains the data." Alpaydin, 2004

### 2004

- (Grunwald, 2004) ⇒ Peter Grunwald. (2004). “A tutorial introduction to the minimum description length principle.” In: arXiv http://arxiv.org/abs/math/0406077

### 1998

- (Grünwald, 1998) ⇒ Peter Grünwald. (1998). “The Minimum Description Length Principle and Reasoning under Uncertainty.
- ABSTRACT: To be able to forecast future events, science wants to infer general laws and principles from particular instances. This process of inductive inference is a central theme in statistics, pattern recognition and the branch of Artificial Intelligence called `machine learning'. The Minimum Description Length (MDL) Principle is a relatively recent method for inductive inference. It has its roots in information theory and theoretical computer science (Kolmogorov complexity) rather than statistics. The fundamental idea behind the MDL Principle is that any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally. The more regularities there are in the data, the more we can compress it. This leads to the view (which is just a version of Occam's famous razor) that the more we can compress a given set of data, the more we can say we have learned about the data.
- QUOTE: The fundamental idea behind the MDL Principle is that any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally.

### 1978

- (Rissanen, 1978) ⇒ Jorma Rissanen. (1978). “Modeling by Shortest Data Descriptions.” doi:10.1016/0005-1098(78)90005-5
- CITED BY: ~3,466 http://scholar.google.com/scholar?cites=9687374827428579733
- ABTRACT: The number of digits it takes to write down an observed sequence x1, …, xN of a time series depends on the model with its parameters that one assumes to have generated the observed data. Accordingly, by finding the model which minimizes the description length one obtains estimates of both the integer-valued structure parameters and the real-valued system parameters.
- KEYWORDS: Modeling; parameter estimation; identification; statistics; stochastic systems

### 1964

- (Solomonoff, 1964a) ⇒ Ray Solomonoff. (1964). “A Formal Theory of Inductive Inference. Part I.” In: Information and Control, 7(1).