Original by Chris Hillman
(Last modified by Chris Hillman 2 Feb 2001.)
In 1948, motivated by the problem of efficiently transmitting information over a noisy
introduced a revolutionary new probabilistic way of thinking about communication
and simultaneously created the first truly mathematical theory of entropy.
His ideas created a sensation and were rapidly developed along two main lines
which employs probability and ergodic theory
to study the statistical characteristics of data and communication systems,
and coding theory, which uses mainly algebraic and geometric tools to contrive efficient
codes for various situations.
However, while the methods of the two fields are different, they are spiritually so closely related
that it would be misleading to give them two seperate pages on this website.
and Lucent Technologies (formerly Bell
Labs, where Shannon worked for many years), the full text of Shannon's
classic 1948 paper,
A Mathematical Theory of Communication,
is now available electronically.
This paper is very readable and Parts I and II are still in many ways the
still best introduction to the modern concept of entropy.
Here are some truly fabulous expository papers which I think give an excellent
overview of this whole area:
Typical Sequences and All That: Entropy, Pattern Matching, and Data Compression,
the 1994 Shannon Lecture by the late Aaron D. Wyner
(formerly of the Mathematics of Communication Department, Bell Labs).
Focuses on the contrast between
the traditional interpretation of the entropy of a stationary ergodic source in terms
of the number of typical sequences of a given length and the interpretation in terms
of the recurrence of blocks of symbols ("patterns") in a single typical sequence.
The second interpretation leads directly to a practical method for three problems
Advanced undergraduate to beginning graduate level.
- estimating the entropy of an unknown information source from a finite amount of data,
- compressing a finite sequence produced by an unknown information source,
- telling whether a given finite sequence could have reliably been produced by a given source.
A Short Course in Information Theory.
David J. C. MacKay
(Cavendish Laboratory, Cambridge).
An excellent overview of the most fundamental ideas in classical
information theory, at the advanced undergraduate or beginning graduate level.
Telegraphic in style, but quite readable and well illustrated.
His treatment of Shannon's Noiseless and Noisy Coding Theorems
is particularly interesting.
Shannon Theory: Present and Future.
A fascinating panel discussion by
sponsored by the
Information Theory Society of the IEEE.
- Richard Blahut (University of Illinois),
- Imre Csisz'ar (Hungarian Academy of Sciences),
- David Forney (Motorola),
- Prakash Narayan (University of Maryland),
- Mark Pinsker (Russian Academy of Sciences),
- Sergio Verd'u (Princeton University),
If you've forgotten what a logarithm is, you might want to start instead with the
Primer on Information Theory
Thomas D. Schneider
(Laboratory of Molecular Biology, NIH),
which gives a very gentle introduction to Shannon's discrete entropy.
Here are some expository papers giving a nice overview of coding theory:
For the serious student of coding theory, here are some longer expository works,
including some book length textbooks:
Performance and Complexity,
by David Forney (Motorola).
The 1995 Shannon Lecture. A good overview of the complex history of coding theory,
including references to the most recent techniques.
LZW Data Compression,
by Mark Nelson (Greenleaf Software).
An expanded version of an expository article which originally appeared
Dr. Doob's Journal.
Nelson has several other articles on coding theory available on
His articles, unlike most of the others cited here, are definitely firmly
grounded in the world of "real code for real programmers"!
Achieving the Shannon Limit: a Progress Report
John C. Kieffer
(Information Theory Research Group, Electrical Engineering,
University of Minnesota)
is a sketch of the culmination of fifty years of research which
has finally resulted in a spectacular breakthrough with the
development of new coding techniques (turbo codes) which
approach the theoretical best performances first derived by Shannon
The Algebraic Theory of Convolutional Codes,
Robert J. McEliece
(Electrical Engineering, Caltech).
One chapter in the Handbook of Coding Theory.
Theory of Codes.
Lecture notes from a course taught by
and Dominque Perrin
(Université Marne-la-Vallée and Institut Gaspard Monge).
Available chapter by chapter and as a single 4 Meg postscript file.
Information Theory and Coding 314.
Lecture notes from a course taught by
(Centre for Intelligent Information Processing Systems,
Dept. of Electrical & Electronic Engineering,
The University of Western Australia).
Includes brief overviews of Markov sources, communication channels, mutual information,
and using the method of Lagrange multipliers to compute the channel capacity, among other things.
Lectures on Source Coding,
course notes by
John C. Kieffer
(Information Theory Research Group, Electrical Engineering, University of Minnesota).
Covers prefix codes, Huffman codes, codes with memory, Lempel-Ziv codes,
arithmetic codes, run length codes, scalar quantizer design, vector quantizer design,
rate distortion theory, differential coding, trellis codes, and transform coders.
Approximately 200 books on information and coding theory have been published since Shannon's
See the list of textbooks
in this area maintained by
Werner Heise (Minister of Mathematics, Free Republic of Laputa,
a little known breakaway region of Germany--- travel there at your own risk!).
Some of the most recent textbooks (plus one classic) include the following:
(Statistics, Electrical and Computer Engineering, Purdue)
of suggested reading in this field.
- Elements of Information Theory,
by Thomas M. Cover and Joy A Thomas, Wiley, 1991.
A readable and very enjoyable survey at the undergraduate level.
Covers the equipartition theorem, Shannon's coding theorems, maximal entropy,
discrimination and the theory of types (see
Entropy in Statistical Inference and Prediction), and much more.
Highly recommended as a first textbook!
- Communication theory,
by Charles M. Goldie and Richard G.E Pinch,
Cambridge University Press, 1991.
London Mathematical Society Student Texts Vol. 20.
A good undergraduate level survey, more concise but less comprehensive than Thomas & Cover.
- Principles and practice of information theory,
by Richard E Blahut, Addison-Wesley, 1987.
Undergraduate level. Covers the connection between Shannon's entropy and decision theory.
- Entropy optimization principles with applications,
by J.N Kapur and H.K. Kesavan, Academic Press, 1992.
A very readable introduction to the Principle of Maximal Entropy and its many uses,
with a decidely practical orientation.
- Maximum entropy solutions to scientific problems,
by Robert M. Bevensee, Prentice Hall, 1993.
Another recent textbook.
- Information theory with applications,
by Silviu Guiasu, McGraw-Hill, 1977.
Still the best source I know for applications of entropy to classification theory.
Many of the later chapters can be read independently of the earlier chapters (which
are seriously outdated and written in a unattractive notation).
The Data Compression Book,
by Mark Nelson and Jean-loup Gailly, M & T Books, 1995.
Covers Shannon-Fano and Huffman coding, Adaptive Huffman coding,
Arithmetic coding, the JPEG compression algorithm, etc.
Includes data compression code.
Handbook of Coding Theory,
edited by R. Brualdi, W. C. Huffman, and V. Pless.
Amsterdam: Elsevier Science Publishers, 1996.
- Coding and information theory,
by Steven Roman, Springer-Verlag, 1992.
State of the art coding theorems.
A graduate level text written from the standpoint of probability theory.
- Entropy and information theory,
by Robert M. Gray, Springer-Verlag, 1990.
Another up-to-date graduate level textbook.
Entropy on the World Wide Web.