Minimum Description Length Principle

AKA: Minimum Description Length, MDL Principle, MDL.
Context:.
- It was introduced in (Rissanen, 1978).
- (Grünwald, 1998).
- We seek to minimize the sum of the length, in bits, of an effective description of the model and the length, in bits, of an effective description of the data when encoded with the help of the model. (Sewell, 2006)
See: BIC, Kolmogorov Complexity, Information Theory, Learning Theory; Minimum Message Length.

References

http://en.wikipedia.org/wiki/Minimum_description_length
- The minimum description length (MDL) principle is a formalization of Occam's Razor in which the best hypothesis for a given set of data is the one that leads to the best compression of the data. MDL was introduced by Jorma Rissanen in 1978. It is an important concept in information theory and learning theory.

(Rissanen, 2011) ⇒ Jorma Rissanen. (2011). “Minimum Description Length Principle.” In: (Sammut & Webb, 2011) p.666

http://www.modelselection.org/mdl/
- QUOTE: Minimum description length (MDL) (Rissanen 1978) is a technique from algorithmic information theory which dictates that the best hypothesis for a given set of data is the one that leads to the largest compression of the data. We seek to minimize the sum of the length, in bits, of an effective description of the model and the length, in bits, of an effective description of the data when encoded with the help of the model. Sewell (2006)
- QUOTE: "Minimum description length (MDL) (Rissanen 1978) uses an information theoretic measure. Kolmogorov complexity of a dataset is defined as the shortest description of the data. If the data is simple, it has a short complexity; for example, if it is a sequence of “0”s, we can just write “0” and the length of the sequence. If the data is completely random, then we cannot have any description of the data shorter than the data itself. If a model is appropriate for the data, then it has a good fit to the data, and instead of the data, we can send/store the model description. Out of all the models that describe the data, we want to have the simplest model so that it lends itself to the shortest description. So we again have a trade-off between how simple the model is and how well it explains the data." Alpaydin, 2004

(Grünwald, 1998) ⇒ Peter Grünwald. (1998). “The Minimum Description Length Principle and Reasoning under Uncertainty.
- ABSTRACT: To be able to forecast future events, science wants to infer general laws and principles from particular instances. This process of inductive inference is a central theme in statistics, pattern recognition and the branch of Artificial Intelligence called `machine learning'. The Minimum Description Length (MDL) Principle is a relatively recent method for inductive inference. It has its roots in information theory and theoretical computer science (Kolmogorov complexity) rather than statistics. The fundamental idea behind the MDL Principle is that any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally. The more regularities there are in the data, the more we can compress it. This leads to the view (which is just a version of Occam's famous razor) that the more we can compress a given set of data, the more we can say we have learned about the data.
- QUOTE: The fundamental idea behind the MDL Principle is that any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally.

(Rissanen, 1978) ⇒ Jorma Rissanen. (1978). “Modeling by Shortest Data Descriptions.” doi:10.1016/0005-1098(78)90005-5
- CITED BY: ~3,466 http://scholar.google.com/scholar?cites=9687374827428579733
- ABTRACT: The number of digits it takes to write down an observed sequence x1, …, xN of a time series depends on the model with its parameters that one assumes to have generated the observed data. Accordingly, by finding the model which minimizes the description length one obtains estimates of both the integer-valued structure parameters and the real-valued system parameters.
- KEYWORDS: Modeling; parameter estimation; identification; statistics; stochastic systems