Original by Chris Hillman
(Last modified by Chris Hillman 2 Feb 2001.)
Ronald A. Fisher
introduced a notion of information into the theory of
statistical inference in 1925. This Fisher information
is now understood to be closely related to Shannon's notion of entropy.
In 1951, Solomon Kullback introduced the divergence between two random variables
(this quantity is also called the cross entropy, discrimination, Kullback-Liebler entropy, etc.)
and found another connection between Shannon's entropy and the theory of statistical inference.
For a modern development along these lines, see the following paper:
-
Prediction and Information Theory,
by
John Kieffer (Electrical Engineering, University of Minnesota),
is a nice expository paper offering a good short introduction to the
intimate connection between these two topics.
-
`Universal Prediction of Individual Sequences',
by Meir Feder, Neri Merhav and Michael Gutman,
IEEE Trans. Information Theory 38 (1992): 1258--1270,
won the the 1994 Information Theory Society Prize.
In a second, more informal paper,
Reflections on "Universal Prediction of Individual Sequences".
the three authors describe how they came to write the first paper.
Briefly, their starting point was the observation that the intuition (due to Shannon himself)
that the entropy of an information source measures how well its behavior (e.g. the next symbol in
a sequence it produced) can be predicted, together with the existence of Universal encoders
(e.g. the Lempel-Ziv algorithm; see the talk by Wyner listed above) suggests that there should be
an algorithm for predicting the next symbol in a sequence which is guaranteed to become as accurate
as desired, for any information source, provided you are willing to wait long enough (for the
algorithm to "train itself", if you will).
In 1957, Edwin Jaynes introduced the fundamental Principle of Maximal Entropy.
A little later, this was subsumed by the more general Principle of Minimal Divergence.
Over the last several decades,
Jaynes and his followers
have attempted to develop a Bayseian theory of probability as a "degree of belief",
based upon the Principle of Maximal Entropy.
This program remains highly controversial among probabilists; the philosophical issues
involved are thorny and subtle.
Here are some places you can learn more about the Principle of Maximal Entropy and
its many applications:
-
Maximal Entropy and Bayesian Probability Theory,
a collection of expository sketches and tutorials by someone in the CEMS research group
in the Chemical Science and Technology Division, Los Alamos National Laboratory.
See especially the excellent tutorial on
Inverse Problems and Surprisal Analyisis in Physics.
-
Probablity Theory: The Logic of Science.
The substantially complete draft of Edwin Jayne's enormous book on probability theory
as a science of plausible inference.
If you have the slightest interest in probability,
the philosophy of mathematics or information theory,
you should take a look at this fascinating and provocative book!
See particularly Chapter 11 (Entropy Principle), Chapter 27
(Communication Theory) and Chapter 29 (Statistical Mechanics).
Written for undergraduates, but some chapters are fairly demanding.
(The book is downloadable chapter by chapter as postscript files,
with illustrations included.)
Recently a theory of statistical manifolds has been developed,
in which entropy appears as a geometric quantity related to curvature.
Some idea of how this works can be gained from the following expository paper:
-
From Euclid to Entropy
by Carlos Rodriguez
(Statistics, SUNY Albany).
Did you know that statistical inference and discrimination are related to
the cross-ratio studied in projective geometry?
I didn't!
In this paper, Rodriguez explains at the undergraduate level why
it reasonable to expect at least a "spiritual relation".
Meanwhile, a whole literature on the important problem of estimating entropies from noisy data
has arisen. The papers of
David Wolf (Physics, University of Texas at Austin)
discuss Bayesian estimators of various entropies.
Further Reading
- Theory of statistical inference and information,
by Igor Vajda, Kluwer Academic Publishers, 1989.
- Informed assessments: an introduction to information, entropy, and statistics,
by Alan Jessop, Ellis Horwood, 1995.
Back to
Entropy on the World Wide Web.