Yule–Simon distribution

This MedLibrary.org supplementary page on Yule–Simon distribution is provided directly from the open source Wikipedia as a service to our readers. Please see the note below on authorship of this content, as well as the Wikipedia usage guidelines. To search for other content from our encyclopedia supplement, please use the form below:

Yule–Simon
Probability mass function
Plot of the Yule–Simon PMF
Yule–Simon PMF on a log-log scale. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.)
Cumulative distribution function
Plot of the Yule–Simon CMF
Yule–Simon CMF. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.)
Parameters \rho>0\, shape (real)
Support k \in \{1,2,\dots\}\,
Probability mass function (pmf) \rho\,\mathrm{B}(k, \rho+1)\,
Cumulative distribution function (cdf) 1 - k\,\mathrm{B}(k, \rho+1)\,
Mean \frac{\rho}{\rho-1}\, for \rho>1\,
Median
Mode 1\,
Variance \frac{\rho^2}{(\rho-1)^2\;(\rho-2)}\, for \rho>2\,
Skewness \frac{(\rho+1)^2\;\sqrt{\rho-2}}{(\rho-3)\;\rho}\, for \rho>3\,
Excess kurtosis \rho+3+\frac{11\rho^3-49\rho-22} {(\rho-4)\;(\rho-3)\;\rho}\, for \rho>4\,
Entropy
Moment-generating function (mgf) \frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^t)\,e^t \,
Characteristic function \frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^{i\,t})\,e^{i\,t} \,

In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert Simon. Simon originally called it the Yule distribution1.

The probability mass function of the Yule–Simon (ρ) distribution is

f(k;\rho) = \rho\,\mathrm{B}(k, \rho+1), \,

for integer k \geq 1 and real ρ > 0, where B is the beta function. Equivalently the pmf can be written in terms of the falling factorial as


 f(k;\rho) = \frac{\rho\,\Gamma(\rho+1)}{(k+\rho)^{\underline{\rho+1}}}
 ,
\,

where Γ is the gamma function. Thus, if ρ is an integer,


 f(k;\rho) = \frac{\rho\,\rho!\,(k-1)!}{(k+\rho)!}
 .
\,

The probability mass function f has the property that for sufficiently large k we have


 f(k;\rho)
 \approx \frac{\rho\,\Gamma(\rho+1)}{k^{\rho+1}}
 \propto \frac{1}{k^{\rho+1}}
 .
\,

This means that the tail of the Yule–Simon distribution is a realization of Zipf's law: f(k;ρ) can be used to model, for example, the relative frequency of the kth most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of k.

Contents

Occurrence

The Yule–Simon distribution arose originally as the limiting distribution of a particular stochastic process studied by Yule as a model for the distribution of biological taxa and subtaxa2. Simon dubbed this process the "Yule process" but it is more commonly known today as a preferential attachment process. The preferential attachment process is an urn process in which balls are added to a growing number of urns, each ball being allocated to an urn with probability linear in the number the urn already contains.

The distribution also arises as a continuous mixture of geometric distributions. Specifically, assume that W follows an exponential distribution with scale 1 / ρ or rate ρ:

W \sim \mathrm{Exponential}(\rho)\,
h(w;\rho) = \rho \, \exp(-\rho\,w)\,

Then a Yule–Simon distributed variable K has the following geometric distribution:

K \sim \mathrm{Geometric}(\exp(-W))\,

The pmf of a geometric distribution is

g(k; p) = p  \, (1-p)^{k-1}\,

for k\in\{1,2,\dots\}. The Yule–Simon pmf is then the following exponential-geometric mixture distribution:

f(k;\rho)
 = \int_0^{\infty} \,\,\, g(k;\exp(-w))\,h(w;\rho)\,dw
\,

Generalizations

The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule–Simon(ρ, α) distribution is defined as


 f(k;\rho,\alpha) = \frac{\rho}{1-\alpha^{\rho}} \;
        \mathrm{B}_{1-\alpha}(k, \rho+1)
 ,
 \,

with 0 \leq \alpha < 1. For α = 0 the ordinary Yule–Simon(ρ) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.


See also

Bibliography

  • Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)

References

  1. ^ Simon, H. A. (1955). "On a class of skew distribution functions". Biometrika 42: 425–440. 
  2. ^ Yule, G. U. (1925). "A Mathematical Theory of Evolution, based on the Conclusions of Dr. J. C. Willis, F.R.S.". Philosophical Transactions of the Royal Society of London, Ser. B 213: 21–87. 

Wikipedia content modification information:

  • This page was last modified on 13 August 2008, at 22:34.

Wikipedia Authorship and Review

Wikipedia content provided here is not reviewed directly by MedLibrary.org. Wikipedia content is authored by an open community of volunteers and is not produced by or in any way affiliated with MedLibrary.org.

Wikipedia Usage Guidelines

This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article on "Yule–Simon distribution".

The URL for this specific entry is:

All Wikipedia text is available under the terms of the GNU Free Documentation License. (See Copyrights for details). Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.