Entropy in Statistical Inference and Prediction

Original by Chris Hillman (Last modified by Chris Hillman 2 Feb 2001.)

Ronald A. Fisher introduced a notion of information into the theory of statistical inference in 1925. This Fisher information is now understood to be closely related to Shannon's notion of entropy. In 1951, Solomon Kullback introduced the divergence between two random variables (this quantity is also called the cross entropy, discrimination, Kullback-Liebler entropy, etc.) and found another connection between Shannon's entropy and the theory of statistical inference.

For a modern development along these lines, see the following paper:

In 1957, Edwin Jaynes introduced the fundamental Principle of Maximal Entropy. A little later, this was subsumed by the more general Principle of Minimal Divergence. Over the last several decades, Jaynes and his followers have attempted to develop a Bayseian theory of probability as a "degree of belief", based upon the Principle of Maximal Entropy. This program remains highly controversial among probabilists; the philosophical issues involved are thorny and subtle.

Here are some places you can learn more about the Principle of Maximal Entropy and its many applications:

Recently a theory of statistical manifolds has been developed, in which entropy appears as a geometric quantity related to curvature. Some idea of how this works can be gained from the following expository paper:

Meanwhile, a whole literature on the important problem of estimating entropies from noisy data has arisen. The papers of David Wolf (Physics, University of Texas at Austin) discuss Bayesian estimators of various entropies.

Further Reading

Back to Entropy on the World Wide Web.