I understand this to be the sum of all the particles, each being in a certain energy state $E_i$
This is not correct. The sum is taken over all possible microstates of the system, with $E_i$ being the energy of microstate $i$. For example, if my system has three possible states with energies $E_1,E_2,E_3$, then the partition function would be $Z=e^{-\beta E_1} + e^{-\beta E_2} + e^{-\beta E_3}$. If $E_2=E_3= E'$, then the partition function would be $Z=e^{-\beta E_1} + e^{-\beta E'} + e^{-\beta E'} = e^{-\beta E_1} + 2e^{-\beta E'}$. If two different microstates have the same energy $E'$, then $e^{-\beta E'}$ appears in the partition function sum twice - we're summing over states, not energies.
The idea behind this is as follows. Consider a system in thermal equilibrium with a large reservoir. If the state $i$ has energy $E_i$, then the probability $P_i$ that the system is in state $i$ is $P_i = Ce^{-\beta E_i}$ for some constant of proportionality $C$.
If we sum the probabilities over all possible states, then the result must of course be equal to 1; that means that
$$\sum_{i\in\text{States}} Ce^{-\beta E_i} = C \sum_{i\in \text{States}} e^{-\beta E_i} = 1 \implies C = \frac{1}{\sum_{i\in\text{States}} e^{-\beta E_i}}$$
Defining the partition function
$$Z := \sum_{i\in\text{States}} e^{-\beta E_i}$$
we then have that the probability that the particle is in state $i$ is simply
$$P_i = \frac{e^{-\beta E_i}}{Z}$$
Having gotten that out of the way, I think there are two conceptual problems that you're running into:
- How do we handle this in classical physics, where states aren't discrete (and therefore can't simply be added up)?
- How does the number of particles in the system enter into this conversation?
In classical physics (in particular, the Hamiltonian formulation of mechanics), the state of a particle (moving in $3$ dimensions) corresponds to a point in $6$-dimensional phase space $(\vec q,\vec p)$. The state of a system of two particles moving in $3$ dimensions corresponds to a point in $12$-dimensional phase space $(\vec q_1, \vec q_2, \vec p_1, \vec p_2)$.
In general, if the system has $N$ particles, then the state of the system corresponds to a point in $6N-$dimensional phase space $(\vec q_1,\ldots, \vec q_N,\vec p_1, \ldots, \vec p_N)$.
If our system corresponds to a Hamiltonian
$$\mathcal H = \sum_{i=1}^N \frac{|\vec p_i|^2}{2m} + U(\vec q_1,\ldots, \vec q_N)$$
then the Boltzmann factor becomes
$$e^{-\beta \mathcal H} = e^{-\beta \left(\sum_{i=1}^N \frac{|\vec p_i|^2}{2m}\right)} \cdot e^{-\beta U(\vec q_1,\ldots, \vec q_N)}$$
Summing over all states amounts to integrating over all points in the $6N-$dimensional phase space, which means that
$$Z = \sum_{\text{States}} e^{-\beta H} \rightarrow \frac{1}{h^{3N}}\int d\vec q_1 \ldots \int d\vec q_N \int d\vec p_1 \ldots \int d\vec p_N e^{-\beta \left(\sum_{i=1}^N \frac{|\vec p_i|^2}{2m}\right)} \cdot e^{-\beta U(\vec q_1,\ldots, \vec q_N)}$$
where $h$ is some number with dimensions of momentum $\times$ length, so as to make the integral dimensionless (a common choice is Planck's constant, but this factor drops out from all physical predictions so it doesn't actually matter).
This looks nightmarish. However, exponentials have the beautiful property that $e^{x+y}=e^x e^y$; it follows that
$$Z = \frac{1}{h^{3N}} \left(\int d\vec p_1 e^{-\frac{\beta |\vec p_1|^2}{2m}}\right)\ldots \left(\int d\vec p_N e^{-\frac{\beta |\vec p_N|^2}{2m}}\right) \times \int d\vec q_1 \ldots \int d\vec q_N e^{-\beta U(\vec q_1,\ldots,\vec q_N)}$$
But of course each of those momentum integrals is the same, which means that
$$Z = \frac{1}{h^{3N}}\left(\int d\vec p e^{-\frac{\beta |\vec p|^2}{2m}}\right)^N \times \int d\vec q_1 \ldots \int d\vec q_N e^{-\beta U(\vec q_1,\ldots,\vec q_N)}$$
The first term is just a Gaussian integral, which is easy to evaluate. The second term is still generically nightmarish. However, if it turns out that the interparticle forces can be neglected (e.g. any potential is an external one, like an external gravitational field), then $U(\vec q_1,\ldots,\vec q_N) =\sum_{i=1}^N u(\vec q_i)$, where $u$ is the external potential. This means that the potential term can also be factored, yielding
$$Z = \frac{1}{h^{3N}} \left(\int d\vec q \int d\vec p e^{-\beta\left(\frac{|\vec p|^2}{2m} + u(\vec q)\right)}\right)^N$$
In such cases, it is reasonable to refer to the "single-particle partition function"
$$Z_1 = \frac{1}{h^3} \int d\vec q \int d\vec p e^{-\beta\left(\frac{|\vec p|^2}{2m}+u(\vec q)\right)}$$
and to simply say that $Z = Z_1^N$. Again, though, this only works if the potential energy term can be factorized, which requires either a non-interacting system or one which is subject to considerable simplifying assumptions.