Information Theory: A Tutorial Introduction by James V Stone, Sebtel Press 2016
Richly illustrated with accessible examples such as everyday games like '20 questions.' Online MatLab and Python computer programs provide hands-on experience. Written in an informal style. An ideal primer for novices who wish to learn the essential principles and applications of information theory.
Ergodic theory and information by Patrick Billingsley, J Wiley 1965
Review by Arshag Hajian
Far from modern but very reader-friendly. Loose, free, somewhat controversial style. Uses many simple familiar examples and interesting side discussions to motivate notions and techniques. Coverage of the usual topics in information theory is very limited. Useful for a beginner in the field.
Elements of Information Theory by Thomas M. Cover & Joy A. Thomas, J Wiley 2ed 2006 748pp
Clear, thought-provoking instruction. Engaging mix of mathematics, physics, statistics, and information theory.
Covers all the essential topics in information theory : entropy, data compression, channel capacity, rate distortion, network information theory, and hypothesis testing. Problem sets, summary at the end of each chapter, historical notes.
The ideal textbook for upper-level undergraduate and graduate courses in electrical engineering, statistics, and telecommunications.
Information Theory, Part I: An Introduction to the Fundamental Concepts by Arieh Ben-Niem, World Scientific 2017, 350pp
"...Written for those using the concepts of information theory to study problems considered outside of usual realm of information theory."
Unlike many books, which refer to the Shannon's Measure of information (SMI) as "Entropy," this book makes a clear distinction between the SMI and Entropy. In the last chapter, Entropy is derived as a special case of SMI.
Friendly, simple language, full of practical examples. Emphasis is on the concepts and their meaning rather on the mathematical details.
Entropy and Information Theory (1ed 1990 2ed 2011) by Robert M. Gray
Not an introduction but a thorough, formal development of Shannon's mathematical theory of communication.
Its practical goals are the theory of probabilistic information measures and their application to coding theorems for information sources and noisy channels. But it is mostly devoted to tools and methods in random variables, random processes, and dynamical systems. Examples are entropy, mutual information, conditional entropy, conditional information, and discrimination or relative entropy, along with the limiting normalized versions of these quantities such as entropy rate and information rate. It is the only up-to-date treatment of traditional information theory emphasizing ergodic theory.
Acknowledgement : Mathematics SE : A good textbook to learn about entropy and information theory