While I know this question has already been answered, I felt obligated to get around to writing a formal answer. I will not go into detail about irreversible/dissipative systems, as Arnold Neumaier's answer already addressed that issue. Rather, my answer will focus on the mathematics behind ergodicity and mixing.
Note: Most of my comments were taken from Penrose [1979] in the following.
Background
First let us define $\boldsymbol{\Gamma}$ to be the whole of phase space, described by the position and momentum coordinates, $\mathbf{q}$ and $\mathbf{p}$, respectively. Then if we define the phase space density as that which satisfies:
$$
\int_{\boldsymbol{\Gamma}} \ d^{n}q \ d^{n}p \ \rho\left( \mathbf{q}, \mathbf{p} \right) = 1 \tag{1}
$$
where $n$ is the number of degrees of freedom and $\rho\left( \mathbf{q}, \mathbf{p} \right)$ is the phase space probability density.
Now if we use a generic variable, $G\left( \mathbf{q}, \mathbf{p} \right)$, to describe any dynamical variable (e.g., energy), then the ensemble average of $G$ is denoted by:
$$
\langle G \rangle = \int_{\boldsymbol{\Gamma}} \ d^{n}q \ d^{n}p \ G\left( \mathbf{q}, \mathbf{p} \right) \ \rho\left( \mathbf{q}, \mathbf{p} \right) \tag{2}
$$
Three Principles
However, there is an issue to be aware of at this point [i.e., page 1940 in Penrose, 1979]:
The fundamental problem of statistical mechanics is what ensemble – that is, what phase-space probability density $\rho$ – corresponds to a given physical situation... It is, however, possible to state three principles that the phase-space
density should satisfy; and it turns out, rather surprisingly, that these principles when combined with a study of the dynamics of our mechanical models give enough information to answer the fundamental problem satisfactorily in some important cases.
1st Principle
The first of the three principles is just Liouville's theorem in the limit where $d \rho/dt = 0$. It's another way of saying that the Hamiltonian of the system does not explicitly depend upon time, which is how we define the system to be isolated.
2nd Principle
The second principle is stated as [i.e., page 1941 in Penrose, 1979]:
The second of the three principles is more general, since it does not require the
system to be isolated... The principle, which I shall call the principle of causality, is simply that the phase-space density at any time is completely determined by what happened to the system before that time, and is unaffected by what will happen to the system in the future.
3rd Principle
Finally, the third principle is stated as [i.e., page 1941 in Penrose, 1979]:
The last of the three principles is that the probabilities in the ensemble really can be described by a phase-space density $\rho$ with $\rho$ a well-behaved (say, piecewise continuous) function, rather than some more general measure.
Now the last principle, it is important to note, is actually very important but often overlooked. It is important because if we require it, we cannot include systems like a gas of hard spheres in a cubical box where all spheres bounce between the same two faces for eternity (i.e., the spheres move only along one-dimension). That is to say, a time-average of this imaginary system will not be the same as an ensemble average (see explanation below). Let us define any system like this as an exceptional system, for brevity.
As an aside, the problems with time-averages in classical electricity and magnetism are well known and it is now known that spatial ensemble averages are the correct operations for converting between the micro- and macroscopic forms of Maxwell's equations [e.g., see pages 248-258 in Jackson, 1999 for a detailed discussion].
Ergodicity and Mixing
Ergodicity
If $G$ is a dynamical variable, then we can define the ensemble average over time as:
$$
\langle G \rangle_{t} = \int_{\boldsymbol{\Gamma}} \ d^{n}q \ d^{n}p \ G\left( \mathbf{q}, \mathbf{p} \right) \ \rho_{t}\left( \mathbf{q}, \mathbf{p} \right) \tag{3}
$$
where we can obtain $\rho_{t}$ using the assumption that Liouville's theorem holds (i.e., $d \rho/dt = 0$).
Note that when $\langle G \rangle_{t}$ exists, we can define it as an equilibrium value of $G$. However, it is worth noting that $\langle G \rangle_{t}$ does not necessarily exist, as in the case of any oscillating system without damping (e.g., simple harmonic oscillator). In other words, $\lim_{t \rightarrow \infty} \langle G \rangle_{t}$ will not approach a single value, it will oscillate indefinitely.
The time-average, however, always exists and one can avoid calculating a nonexistent value by redefining the equilibrium value of $G$ as:
$$
\langle G \rangle_{eq} \equiv \lim_{t \rightarrow \infty} \ \frac{1}{T} \int_{0}^{T} \ dt \ \langle G \rangle_{t} \tag{4}
$$
which is equal to $\lim_{t \rightarrow \infty} \langle G \rangle_{t}$ if $\langle G \rangle_{t}$ exists.
If we define the time-average of $\rho$ as $\bar{\rho}$, we can write this as:
$$
\bar{\rho}\left( \mathbf{q}, \mathbf{p} \right) = \lim_{t \rightarrow \infty} \ \frac{1}{T} \int_{0}^{T} \ dt \ \rho_{t}\left( \mathbf{q}, \mathbf{p} \right) \tag{5}
$$
which allows us to redefine $\langle G \rangle_{eq}$ as:
$$
\langle G \rangle_{eq} = \int_{\boldsymbol{\Gamma}} \ d^{n}q \ d^{n}p \ G\left( \mathbf{q}, \mathbf{p} \right) \ \bar{\rho}\left( \mathbf{q}, \mathbf{p} \right) \tag{6}
$$
It is important to note some properties of ergodic theory here [e.g., page 1949 in Penrose, 1979]:
It follows from the ergodic theorem of Birkhoff (1931) that $\bar{\rho}$ is well-defined at almost all phase points... consequently the integral in (1.16) is well-defined... Birkhoff's theorem also shows that $\bar{\rho}$ is constant on the trajectories in phase space...
where the integral (1.16)
in the quote refers to the version of $\langle G \rangle_{eq}$ in Equation 6. The last statement, namely that $\bar{\rho}$ is an invariant, is crucial here. Were it not an invariant, it "...would require us to solve the equations of motion for $10^{23}$-odd particles..." [e.g., page 1945 of Penrose, 1979].
Important Side Note: Recall again that Equation 6 given above for $\langle G \rangle_{eq}$ does not always hold, as in the trivial case of an undamped simple harmonic oscillator because the integral on the right-hand side oscillates forever.
Assume we can write $\bar{\rho}\left( \mathbf{q}, \mathbf{p} \right) = \phi\left( x \right)$, where $\phi$ is an arbitrary function of only one variable. If $\phi\left( x \right) \rightarrow \phi\left( H \right)$, where $H$ is the Hamiltonian, for all $\bar{\rho}$ in a system, then the system is said to be ergodic. Another way of stating this is that if the system were ergodic, the trajectories would cover all parts of an energy manifold if given enough time.
Mixing
Let us define the microcanonical average over energy of $G$ as:
$$
\langle G \rangle_{E} = \frac{ \int_{\boldsymbol{\Gamma}} \ d^{n}q \ d^{n}p \ G\left( \mathbf{q}, \mathbf{p} \right) \ \delta\left( H\left( \mathbf{q}, \mathbf{p} \right) - E \right) }{ \int_{\boldsymbol{\Gamma}} \ d^{n}q \ d^{n}p \ \delta\left( H\left( \mathbf{q}, \mathbf{p} \right) - E \right) } \tag{7}
$$
where $\delta()$ is the Dirac delta function, $H\left( \mathbf{q}, \mathbf{p} \right)$ is the Hamiltonian, and $E$ are energy manifolds (i.e., systems that have energy $E$).
Thus, we can redefine $\langle G \rangle_{eq}$ as:
$$
\begin{align}
\langle G \rangle_{eq} & = \int_{\boldsymbol{\Gamma}} \ d^{n}q \ d^{n}p \ G\left( \mathbf{q}, \mathbf{p} \right) \ \bar{\rho}\left( \mathbf{q}, \mathbf{p} \right) \tag{8a} \\
& = \int_{\boldsymbol{\Gamma}} \ d^{n}q \ d^{n}p \ G\left( \mathbf{q}, \mathbf{p} \right) \ \phi\left( H \right) \tag{8b} \\
& = \int_{\boldsymbol{\Gamma}} \ d^{n}q \ d^{n}p \ G\left( \mathbf{q}, \mathbf{p} \right) \ \left[ \int_{-\infty}^{\infty} \ dE \ \phi\left( E \right) \ \delta\left( E - H\left( \mathbf{q}, \mathbf{p} \right) \right) \right] \tag{8c} \\
& = \int_{-\infty}^{\infty} \ dE \ P\left( E \right) \ \langle G \rangle_{E} \tag{8d}
\end{align}
$$
where $P\left( E \right)$ is given by:
$$
P\left( E \right) = \int_{\boldsymbol{\Gamma}} \ d^{n}q \ d^{n}p \ \bar{\rho}\left( \mathbf{q}, \mathbf{p} \right) \ \delta\left( E - H\left( \mathbf{q}, \mathbf{p} \right) \right) \tag{9}
$$
Note that $P\left( E \right)$ is just the probability density of $H$ in the time-averaged ensemble.
Now to define mixing we consider whether the following holds:
$$
\lim_{t \rightarrow \infty} \ \langle \rho_{0}\left( \mathbf{q}, \mathbf{p} \right) \ G\left( \mathbf{q}, \mathbf{p} \right) \rangle_{E} = \langle \rho_{0}\left( \mathbf{q}, \mathbf{p} \right) \rangle_{E} \ \langle G\left( \mathbf{q}, \mathbf{p} \right) \rangle_{E} \tag{10}
$$
where $\rho_{0}$ is just the initial value of $\rho_{t}$.
If the system, for every $E$ and functions $\rho_{0}$ and $G$, satisfies the above relationship, the system is said to be mixing [i.e., pages 1948-1949 in Penrose, 1979]:
Mixing can easily be shown to imply ergodicity (e.g. Arnold and Avez (1968, p20); the equivalence of our definition of mixing and theirs follows from their theorem 9.8), but is not implied by it; for example, as mentioned earlier, the harmonic oscillator is ergodic but not mixing... The precise definition of mixing is... 'whether an ensemble of isolated systems has any tendency in the course of time toward a state of statistical equilibrium'...
Note that mixing is not sufficient to imply a system will approach equilibrium [i.e., page 1949 in Penrose, 1979]:
Mixing tells us that the average $\langle G \rangle_{t}$ of a dynamical variable $G$, taken over the appropriate ensemble, approaches an equilibrium value $\langle G \rangle_{eq}$; it does not tell us anything about the time variation of $G$ in any of the individual systems comprised in that ensemble. To make useful predictions about the behaviour of G in any individual system we must show that the individual values of G are likely to be close to $\langle G \rangle$, i.e. that the fluctuations of $G$ are small, and to do this we have to use the large size of the system as well as its mixing property...
Additional and/or Related Answers
References
- Evans, D.J. "On the entropy of nonequilibrium states," J. Statistical Phys. 57, pp. 745-758, doi:10.1007/BF01022830, 1989.
- Evans, D.J., and G. Morriss Statistical Mechanics of Nonequilibrium Liquids, 1st edition, Academic Press, London, 1990.
- Evans, D.J., E.G.D. Cohen, and G.P. Morriss "Viscosity of a simple fluid from its maximal Lyapunov exponents," Phys. Rev. A 42, pp. 5990–5997, doi:10.1103/PhysRevA.42.5990, 1990.
- Evans, D.J., and D.J. Searles "Equilibrium microstates which generate second law violating steady states," Phys. Rev. E 50, pp. 1645–1648, doi:10.1103/PhysRevE.50.1645, 1994.
- Gressman, P.T., and R.M. Strain "Global classical solutions of the Boltzmann equation with long-range interactions," Proc. Nat. Acad. Sci. USA 107, pp. 5744–5749, doi:10.1073/pnas.1001185107, 2010.
- Hoover, W. (Ed.) Molecular Dynamics, Lecture Notes in Physics, Berlin Springer Verlag, Vol. 258, 1986.
- J.D. Jackson, Classical Electrodynamics, Third Edition, John Wiley & Sons, Inc., New York, NY, 1999.
- O. Penrose, "Foundations of statistical mechanics," Rep. Prog. Phys. 42, pp. 1937-2006, 1979.