Ronald A. Fisher introduced a notion of information into the theory of statistical inference in 1925. This

For a modern development along these lines, see the following paper:

**Prediction and Information Theory,**by John Kieffer (Electrical Engineering, University of Minnesota), is a nice expository paper offering a good short introduction to the intimate connection between these two topics.**`Universal Prediction of Individual Sequences',**by Meir Feder, Neri Merhav and Michael Gutman, IEEE Trans. Information Theory 38 (1992): 1258--1270, won the the 1994 Information Theory Society Prize. In a second, more informal paper, Reflections on "Universal Prediction of Individual Sequences". the three authors describe how they came to write the first paper. Briefly, their starting point was the observation that the intuition (due to Shannon himself) that the entropy of an information source measures how well its behavior (e.g. the next symbol in a sequence it produced) can be predicted, together with the existence of Universal encoders (e.g. the Lempel-Ziv algorithm; see the talk by Wyner listed above) suggests that there should be an algorithm for predicting the next symbol in a sequence which is guaranteed to become as accurate as desired, for any information source, provided you are willing to wait long enough (for the algorithm to "train itself", if you will).

In 1957, Edwin Jaynes introduced the fundamental *Principle of Maximal Entropy*.
A little later, this was subsumed by the more general *Principle of Minimal Divergence*.
Over the last several decades,
Jaynes and his followers
have attempted to develop a Bayseian theory of probability as a "degree of belief",
based upon the Principle of Maximal Entropy.
This program remains highly controversial among probabilists; the philosophical issues
involved are thorny and subtle.

Here are some places you can learn more about the Principle of Maximal Entropy and its many applications:

**Maximal Entropy and Bayesian Probability Theory,**a collection of expository sketches and tutorials by someone in the CEMS research group in the Chemical Science and Technology Division, Los Alamos National Laboratory. See especially the excellent tutorial on Inverse Problems and Surprisal Analyisis in Physics.

**Probablity Theory: The Logic of Science.**The substantially complete draft of Edwin Jayne's enormous book on probability theory as a science of plausible inference.*If you have the slightest interest in probability, the philosophy of mathematics or information theory, you should take a look at this fascinating and provocative book!*See particularly Chapter 11 (Entropy Principle), Chapter 27 (Communication Theory) and Chapter 29 (Statistical Mechanics). Written for undergraduates, but some chapters are fairly demanding. (The book is downloadable chapter by chapter as postscript files, with illustrations included.)

Recently a theory of *statistical manifolds* has been developed,
in which entropy appears as a geometric quantity related to curvature.
Some idea of how this works can be gained from the following expository paper:

**From Euclid to Entropy**by Carlos Rodriguez (Statistics, SUNY Albany). Did you know that statistical inference and discrimination are related to the cross-ratio studied in projective geometry? I didn't! In this paper, Rodriguez explains at the undergraduate level why it reasonable to expect at least a "spiritual relation".

**Theory of statistical inference and information**, by Igor Vajda, Kluwer Academic Publishers, 1989.

**Informed assessments: an introduction to information, entropy, and statistics,**by Alan Jessop, Ellis Horwood, 1995.

Back to Entropy on the World Wide Web.