$\newcommand{\bra}[1]{\langle #1 \rvert}\newcommand{\ket}[1]{\lvert #1 \rangle}\newcommand{\H}{\mathcal{H}}\newcommand{\tr}{\mathrm{tr}}$The connection between subsystems and statistical ensembles is simple: Entanglement.
Entanglement is the phenomenon that a quantum state of a larger system need not (but may) correspond to uniquely specified states of subsystems. Formally, given systems $A,B$ with spaces of states $\H_A,\H_B$ and bases $\{\ket{a_i}\},\{\ket{b_j}\}$, who combine to a system with space of states $\H_{AB} := \H_A\otimes \H_B$, the states which do correspond to unique states of the subsystems (the non-entangled or separable states) are those whose are simple tensors, i.e. of the form $\ket{\phi}\otimes\ket{\psi}\in\H_{AB}$ for $\ket{\phi}\in\H_A,\ket{\psi}\in\H_B$.
If we are now given an arbitrary state of the combined system, $\ket{\chi}\in\H_{AB}$, what can we say about the state of $A$? If the state is not separable, we cannot obtain a pure state for $A$. But, nevertheless, we should be able to "forget" the larger system and restrict our considerations to $\H_A$ (e.g. because we know it won't interact with $\H_B$ further, because we can only measure system $A$ but not $B$, or whatever). The way this is done is by the partial trace1 over $B$ applied to the density matrix $\rho_{AB} = \ket{\chi}\bra{\chi}$:
$$ \rho_A = \tr_B(\rho_{AB})$$
which turns an operator on $\H_{AB}$ into an operator on $\H_A$ by "averaging/summing" over all basis vectors of $B$. The crucial point is that where $\rho_{AB}$ was the density matrix of a pure state, $\rho_A$ need not be - and, in fact, is not if $\ket{\chi}$ was not separable. $\rho_A$ is now a mixed state that has certain probabilities to be in the states $\ket{a_i}$ that appear in the normalized state $\ket{\chi} = \sum_{i,j}c_{ij}\ket{a_i}\otimes\ket{b_j}$. Quantitatively, the probability to be in the state $\ket{a_i}$ is by summing over all of $B$:
$$ p(\ket{a_i}) = \sum_j \lvert c_{ij} \rvert^2 \tag{1}$$
Altogether, we have obtained a mixed state (or "statistical ensemble") for the system from it being a subsystem of a larger system.
It also goes the other way around. Starting from a mixed state, i.e. a set of probabilities $p_i$ to be in $\ket{a_i}\in\H_A$, we may purify the state, i.e. construct an artificial larger space ${\H}$ in which it is an entangled pure state: We simply "square" the space, i.e. $\H := \H_A\otimes\H_A$ and define $\ket{\chi'} := \sum_i\sqrt{p_i}\ket{a_i}\otimes\ket{a_i}$. Comparing with $(1)$, we see that $c_{ij} = p_i\delta_{ij}$, and so $p(\ket{a_i}) = p_i$. Therefore, we have constructed a pure entangled state which gives back the original mixed state when going back to the subsystem.
Altogether, we have seen the equivalence of "having a mixed state" and "considering a subsystem of a larger system".
1Why the partial trace gives us the subsystem follows from its definition, if suitably written down, but lies in it reproducing $(1)$, which is simply the statistically correct way of "not caring" about the state of $B$:
Consider writing $\H_B = \bigoplus_j \mathbb{C}(\ket{b_j})$, which is just the statement that the $\ket{b_i}$ are a orthogonal basis, and thus $\H_{AB} = \bigoplus_j (\H_A \otimes \mathbb{C}\ket{b_j}$. Then, every operator $T : \H_{AB}\to\H_{AB}$ decomposes into the direct sum of operators $T_{kl} : \H_A \otimes \mathbb{C}\ket{b_k} \to \H_A \otimes \mathbb{C}\ket{b_l}$. We want those operators that act only on $\H_A$, since we want to forget about $B$. These are exactly those with $k = l$, and we sum them to get the partially traced operator on $\H_A$:
$$ T_A := \sum_{k} T_{kk} =: \tr_B(T)$$
A good check this is the correct way to obtain the density matrix is taking any density matrices $\rho_A : \H_A\to\H_A, \rho_B : \H_B\to\H_B$ where $\rho_B = \sum_j q_j \ket{b_j}\bra{b_j}$, and observing that $\rho := \rho_A\otimes\rho_B$ produces
$$ \tr_B(\rho) = \sum_j q_j \rho_A = \rho_A$$
and for general $\ket{\chi}$, it reproduces $(1)$. Thus, the partial trace is a convenient way to formulate the idea of summing over the basis vectors of $B$, effectively forgetting them, without leaving the density matrix formalism.