Growth Function

From GM-RKB
Jump to navigation Jump to search

A Growth Function is a Statistical Learning function that measures the richness of a Set Family.



References

2019a

  1. Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9.
  2. Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine Learning. USA, Massachusetts: MIT Press. ISBN 9780262018258., especially Section 3.2
  3. Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135.

2019b

  • (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Shattered_set#Shatter_coefficient Retrieved:2019-12-6.
    • To quantify the richness of a collection C of sets, we use the concept of shattering coefficients (also known as the growth function). For a collection C of sets [math]\displaystyle{ s \subset \Omega }[/math] , [math]\displaystyle{ \Omega }[/math] being any space, often a sample space, and [math]\displaystyle{ x_1,x_2,\dots,x_n \in \Omega }[/math] being any set of n points, we define

      the nth shattering coefficient of C as : [math]\displaystyle{ S_C(n) = \max_{\forall x_1,x_2,\dots,x_n \in \Omega } \operatorname{card} \{\,\{\,x_1,x_2,\dots,x_n\}\cap s, s\in C \} }[/math] where [math]\displaystyle{ \operatorname{card} }[/math] denotes the cardinality of the set. [math]\displaystyle{ S_C(n) }[/math] is the largest number of subsets of any set A of n points that can be formed by intersecting A with the sets in collection C.

      Here are some facts about [math]\displaystyle{ S_C(n) }[/math] :

      1. [math]\displaystyle{ S_C(n)\leq 2^n }[/math] for all n because [math]\displaystyle{ \{s\cap A|s\in C\}\subseteq P(A) }[/math] for any [math]\displaystyle{ A\subseteq \Omega }[/math] .
      2. If [math]\displaystyle{ S_C(n)=2^n }[/math], that means there is a set of cardinality n, which can be shattered by C.
      3. If [math]\displaystyle{ S_C(N)\lt 2^N }[/math] for some [math]\displaystyle{ N\gt 1 }[/math] then [math]\displaystyle{ S_C(n)\lt 2^n }[/math] for all [math]\displaystyle{ n\geq N }[/math] .
    • The third property means that if C cannot shatter any set of cardinality N then it can not shatter sets of larger cardinalities.


2018a

2018b

  • (Rebeschini, 2018) ⇒ Patrick Rebeschini (2018). "VC Dimension. Covering and Packing Numbers - Lecture 4". In: Algorithmic Foundations of Learning. University of Oxford, Michaelmas Term 2018.
    • QUOTE: Definition 4.2 The growth function of [math]\displaystyle{ \mathcal{A} }[/math] is defined as follows, for any integer [math]\displaystyle{ n \geq 1 }[/math]:

      [math]\displaystyle{ \tau_{\mathcal{A}}(n) := \underset{x\in \mathcal{X}^n}{sup}{|A \circ x|} }[/math]

      The quantity [math]\displaystyle{ \tau_{\mathcal{A}}(n) }[/math] is the maximal cardinality of the set of distinct labelings of [math]\displaystyle{ n }[/math] points in [math]\displaystyle{ \mathcal{X} }[/math] that can be obtained using classifiers from [math]\displaystyle{ \mathcal{A} }[/math]. As the growth function of [math]\displaystyle{ \mathcal{A} }[/math] if finite, an immediate application of Massart's lemma, Lemma 2.9, shows that [math]\displaystyle{ \tau_{\mathcal{A}}(n) }[/math] can be used to upper bound the Rademacher complexity.

2018c

  • (Ma et al., 2018) ⇒ Tengyu Ma (Lecturer), Bo Liu, Zhaozhuo Xu (October 22, 2018). "CS229T/STATS231: Statistical Learning Theory - Lecture #9". In: CS229T/STATS231: Statistical Learning Theory Stanford University, Autumn 2018-2019.
    • QUOTE: Here [math]\displaystyle{ \mathcal{H} }[/math] is the hypothesis set with binary outcomes and [math]\displaystyle{ \mathcal{F} }[/math] is the family of loss functions associated to [math]\displaystyle{ \mathcal{H} }[/math]. Let

      [math]\displaystyle{ Q_{z_1,\cdots,z_n} = \{ \left(f(z_1), \cdots , f(z_n) \right) \in \R^n : f \in \mathcal{F}\} }[/math]

      be all possible outputs from [math]\displaystyle{ \mathcal{F} }[/math] on the input sequence [math]\displaystyle{ z_1, z_2, \cdots , z_n }[/math], where [math]\displaystyle{ z_i = (x_i, y_i) }[/math].

      (...)

      Definition 2 (Growth Function). The growth function, a.k.a the shattering coefficient is

      [math]\displaystyle{ S(\mathcal{F}, n) = \underset{z_1,\cdots,z_n}{max}{|Q_{z1,\cdots,z_n}|} }[/math] with [math]\displaystyle{ Q }[/math] defined as in above.

2017a

2017b

2014

  • (Scott et al., 2014) ⇒ Clayton Scott (Lecturer), Srinagesh Sharma, Scott Reed, and Petter Nilsson (2014)."Vapnik-Chevronenkis Theory". In: EECS 598: Statistical Learning Theory, Winter 2014.
    • QUOTE: Let [math]\displaystyle{ \mathcal{H} \subseteq \{0, 1\}^{\mathcal{X}} }[/math]. For [math]\displaystyle{ x_1, x_2, \cdots , x_n \in \mathcal{X} }[/math] denote

      [math]\displaystyle{ N_{\mathcal{H}}(x_1, \cdots , x_n) := |\{(h(x_1), \cdots , h(x_n)) : h \in \mathcal{H}\}| }[/math]

      Clearly [math]\displaystyle{ N_{\mathcal{H}}(x_1, \cdots, x_n) \leq 2^n }[/math]. The nth shatter coefficient is defined as

      [math]\displaystyle{ S_{\mathcal{H}}(n) =: \displaystyle \underset{x_1,\cdots,x_n \in \mathcal{X}} max N_{\mathcal{H}}(x_1, \cdots , x_n) }[/math]

      If [math]\displaystyle{ S_{\mathcal{H}}(n) = 2n }[/math], then [math]\displaystyle{ \exists\; x_1, \cdots , x_n }[/math] such that

      [math]\displaystyle{ N_{\mathcal{H}}(x_1, \cdots , x_n) = 2^n }[/math]

      and we say that [math]\displaystyle{ \mathcal{H} }[/math] shatters [math]\displaystyle{ x_1, \cdots , x_n }[/math].

      Note. The shatter coefficient is sometimes called the growth function in the literature.

2013

1971

  • (Vapnik & Chervonenkis, 1971) ⇒ Vladimir N. Vapnik, and A. Ya. Chervonenkis (1971 - 2015). "On The Uniform Convergence Of Relative Frequencies Of Events To Their Probabilities". In: Theory Probab. Appl., 16(2) (1971). DOI:10.1137/1116025. Translated and Republished In: Measures of complexity. Springer, Cham (2015). DOI:10.1007/978-3-319-21852-6_3.
    • QUOTE: Let [math]\displaystyle{ X_r= x_1,\cdots, x_r }[/math] be a finite sample of elements in [math]\displaystyle{ X }[/math]. Each set [math]\displaystyle{ A }[/math] in [math]\displaystyle{ S }[/math] determines in this sample a subsample [math]\displaystyle{ X_r^A = x_i,\cdots, x_{ik} }[/math] consisting of those terms of the sample [math]\displaystyle{ X }[/math] which belong to [math]\displaystyle{ A }[/math]. We shall say that the set [math]\displaystyle{ A }[/math] induces the subsample [math]\displaystyle{ X_r^A }[/math] in the sample [math]\displaystyle{ X_r }[/math]. We denote the set of all different subsamples induced by the sets of [math]\displaystyle{ S }[/math] in the sample [math]\displaystyle{ X_r }[/math] by [math]\displaystyle{ S(x_1,\cdots, x_r) }[/math] or [math]\displaystyle{ S(X_r) }[/math]. The number of different subsamples of the sample [math]\displaystyle{ X_r }[/math] induced by the sets in [math]\displaystyle{ S }[/math] will be termed the index of the system [math]\displaystyle{ S }[/math] with respect to the sample [math]\displaystyle{ x_1,\cdots, x_r }[/math] and will be denoted by [math]\displaystyle{ \bigtriangleup^S(x_1, \cdots, x_r) }[/math]. Obviously, [math]\displaystyle{ \bigtriangleup^S(x_1,\cdots, x_r) }[/math] is always at most [math]\displaystyle{ 2_r }[/math]. The function

      [math]\displaystyle{ m^S(r) =max \bigtriangleup^S(x_1,\cdots, x_r) }[/math],

      where the maximum is taken over all samples of size [math]\displaystyle{ r }[/math], will be called the growth function.