I was re-reading von Neumann's tome in which he recapitulates his views on his own invention, the density matrix. As well as Dirac's inclusion of this in his second edition of his Principles, and the explanation of it in Landau--Lifschitz (second ed)...now, Landau independently had invented it, too.
Von Neumann's rationale is that if our information about the state of a system is incomplete, then all we know are the probabilities of the results of measurements made on it. As usual, he means specify a method of preparation of the system. But this time, the method is not specified as completely as possible. It need not always produce the same pure state. Note well: von Neumann does not assert that a macroscopic system cannot be in a pure state. Never in his life did he assert this, as far as I know. The whole point of what he is doing is, suppose our knowledge of its state is incomplete.
Next he states some physically reasonable axioms for what laws those probabilities ought to obey. (Technically, he prefers to speak of expectation values of observables instead of the probabilities of the results of a measurement of those observables, but these are equivalent by the usual tricks he already developed about projection valued measures)
Next he states some physically reasonable axioms for what laws those probabilities ought to obey. (Technically, he prefers to speak of expectation values of observables instead of the probabilities of the results of a measurement of those observables, but these are equivalent by the usual tricks he already developed about projection valued measures.)
Then he proves the mathematical theorem that there exists a matrix (or operator) $U$ such that the expectation of an observable $Q$ is given by trace $(UQ)$. He proves uniqueness, too: different mappings from observables to expectations yield distinct $U$. He also characterises the $U$'s that arise from mappings in this way. Such a $U$ he calls a density matrix (or operator).
Thus, the density matrix represents what we know about a system when our knowledge is incomplete.
He also motivates the density matrix by supposing that, for example, we knew the probabilities that the system was in one or another pure state. This knowledge he calls a mixture, and calls such a probability mixture "a mixed state".
He shows that there exists a density matrix $U$ which yields the expectation values of all observables applied to that mixture. He is aware that different mixtures yield the same density matrix.
Landau--Lifschitz take a slightly different point of view. They consider a subsystem, which is not a closed system, of a large, macroscopic system. For example, an unpolarised light beam which has been produced by the sun. The joint system is quite macroscopic, but all our quantum measurements are on the subsystem of the light beam and ignore all the quantum numbers of the sun. L--L like to say that a macroscopic system cannot be in a pure state. They show that all expectation values of quantum measurements on the joint system which ignore the quantum numbers of the sun can be found by tracing out over the ignored variables, using von Neumann's formula for the appropriate $U$.
L--L also include the same motivation von Neumann included, using a statistical mixture of a finite number of pure states, but later explicitly warn against thinking that the density matrix represents a probabilistic mixture (synonym, statistical mixture) of pure states. They call their motivation "purely formal".
L--L include a profound physical discussion of what the quantum pure states of a macroscopic system would look like, how they would behave. What their energy levels would look like. You must read both von Neumann and Landau. The former is logically precise, writes clearly, etc.,, but never has any physical intuition. The latter spews out profound physical insights unpredictably, but writes sloppily, unintelligibly, contradicts himself, etc.
When reading, pay careful attention to the difference between saying "the probability that the result of measuring an observable $Q$ will be $q_i$", "the probability that the system, upon measuring $Q$, will be found to be in the state |$q_i$>", which are both correct and precise and accurate, and (the, IMHO, incorrect) "the probability that the system was in the state |$q_i$>". But, it is only the long debate on Quantum Measurement that has taught us this distinction.