In 1948, motivated by the problem of efficiently transmitting information over a noisy communication channel, Claude Shannon introduced a revolutionary new probabilistic way of thinking about communication and simultaneously created the first truly mathematical theory of entropy. His ideas created a sensation and were rapidly developed along two main lines of development:

Thanks to
Emre Telatar
and Lucent Technologies (formerly Bell
Labs, where Shannon worked for many years), the full text of Shannon's
classic 1948 paper,
A Mathematical Theory of Communication,
is now available electronically.
This paper is very readable and Parts I and II are still in many ways the
still best introduction to the modern concept of entropy.
*Highest recommendation!*

Here are some truly fabulous expository papers which I think give an excellent overview of this whole area:

**Typical Sequences and All That: Entropy, Pattern Matching, and Data Compression,**the 1994 Shannon Lecture by the late Aaron D. Wyner (formerly of the Mathematics of Communication Department, Bell Labs). Focuses on the contrast between the traditional interpretation of the entropy of a stationary ergodic source in terms of the number of typical sequences of a given length and the interpretation in terms of the recurrence of blocks of symbols ("patterns") in a single typical sequence. The second interpretation leads directly to a practical method for three problems

- estimating the entropy of an unknown information source from a finite amount of data,
- compressing a finite sequence produced by an unknown information source,
- telling whether a given finite sequence could have reliably been produced by a given source.

*Highly recommended!*

**A Short Course in Information Theory.**by David J. C. MacKay (Cavendish Laboratory, Cambridge). An excellent overview of the most fundamental ideas in classical information theory, at the advanced undergraduate or beginning graduate level. Telegraphic in style, but quite readable and well illustrated. His treatment of Shannon's Noiseless and Noisy Coding Theorems is particularly interesting.*Highly recommended!*

**Shannon Theory: Present and Future.**A fascinating panel discussion by

- Richard Blahut (University of Illinois),
- Imre Csisz'ar (Hungarian Academy of Sciences),
- David Forney (Motorola),
- Prakash Narayan (University of Maryland),
- Mark Pinsker (Russian Academy of Sciences),
- Sergio Verd'u (Princeton University),

If you've forgotten what a logarithm is, you might want to start instead with the
Primer on Information Theory
by
Thomas D. Schneider
(Laboratory of Molecular Biology, NIH),
which gives a very gentle introduction to Shannon's discrete entropy.

Here are some expository papers giving a nice overview of coding theory:

**Performance and Complexity,**by David Forney (Motorola). The 1995 Shannon Lecture. A good overview of the complex history of coding theory, including references to the most recent techniques.

**LZW Data Compression,**by Mark Nelson (Greenleaf Software). An expanded version of an expository article which originally appeared in*Dr. Doob's Journal*. Nelson has several other articles on coding theory available on this page. His articles, unlike most of the others cited here, are definitely firmly grounded in the world of "real code for real programmers"!

**Achieving the Shannon Limit: a Progress Report**by ( John C. Kieffer (Information Theory Research Group, Electrical Engineering, University of Minnesota) is a sketch of the culmination of fifty years of research which has finally resulted in a spectacular breakthrough with the development of new coding techniques (*turbo codes*) which approach the theoretical best performances first derived by Shannon in 1948.

**The Algebraic Theory of Convolutional Codes,**by Robert J. McEliece (Electrical Engineering, Caltech). One chapter in the Handbook of Coding Theory.

**Theory of Codes.**Lecture notes from a course taught by Jean Berstel and Dominque Perrin (Université Marne-la-Vallée and Institut Gaspard Monge). Available chapter by chapter and as a single 4 Meg postscript file.

**Information Theory and Coding 314.**Lecture notes from a course taught by Roberto Togneri (Centre for Intelligent Information Processing Systems, Dept. of Electrical & Electronic Engineering, The University of Western Australia). Includes brief overviews of Markov sources, communication channels, mutual information, and using the method of Lagrange multipliers to compute the channel capacity, among other things.*Highly recommended!*

**Lectures on Source Coding,**course notes by John C. Kieffer (Information Theory Research Group, Electrical Engineering, University of Minnesota). Covers prefix codes, Huffman codes, codes with memory, Lempel-Ziv codes, arithmetic codes, run length codes, scalar quantizer design, vector quantizer design, rate distortion theory, differential coding, trellis codes, and transform coders.

Approximately 200 books on information and coding theory have been published since Shannon's seminal paper. See the list of textbooks in this area maintained by Werner Heise (Minister of Mathematics, Free Republic of Laputa, a little known breakaway region of Germany--- travel there at your own risk!).

Some of the most recent textbooks (plus one classic) include the following:

**Elements of Information Theory,**by Thomas M. Cover and Joy A Thomas, Wiley, 1991. A readable and very enjoyable survey at the undergraduate level. Covers the equipartition theorem, Shannon's coding theorems, maximal entropy, discrimination and the theory of types (see Entropy in Statistical Inference and Prediction), and much more.*Highly recommended as a first textbook!*

**Communication theory,**by Charles M. Goldie and Richard G.E Pinch, Cambridge University Press, 1991. London Mathematical Society Student Texts Vol. 20. A good undergraduate level survey, more concise but less comprehensive than Thomas & Cover.

**Principles and practice of information theory,**by Richard E Blahut, Addison-Wesley, 1987. Undergraduate level. Covers the connection between Shannon's entropy and decision theory.

**Entropy optimization principles with applications,**by J.N Kapur and H.K. Kesavan, Academic Press, 1992. A very readable introduction to the Principle of Maximal Entropy and its many uses, with a decidely practical orientation.

**Maximum entropy solutions to scientific problems,**by Robert M. Bevensee, Prentice Hall, 1993. Another recent textbook.

**Information theory with applications,**by Silviu Guiasu, McGraw-Hill, 1977. Still the best source I know for applications of entropy to classification theory. Many of the later chapters can be read independently of the earlier chapters (which are seriously outdated and written in a unattractive notation).

**The Data Compression Book,**by Mark Nelson and Jean-loup Gailly, M & T Books, 1995. Covers Shannon-Fano and Huffman coding, Adaptive Huffman coding, Arithmetic coding, the JPEG compression algorithm, etc. Includes data compression code.

**Handbook of Coding Theory,**edited by R. Brualdi, W. C. Huffman, and V. Pless. Amsterdam: Elsevier Science Publishers, 1996.

**Coding and information theory,**by Steven Roman, Springer-Verlag, 1992. State of the art coding theorems. A graduate level text written from the standpoint of probability theory.

**Entropy and information theory,**by Robert M. Gray, Springer-Verlag, 1990. Another up-to-date graduate level textbook.

Back to Entropy on the World Wide Web.