The whole notion of negative-energy particles in the Dirac equation comes from the rather pathological limit that one needs to take to go from the correct quantum field theory of Dirac fermions to the "one-particle" approximation which Dirac was (unknowingly) working in. This answer follows the treatment in Merzbacher's quantum mechanics textbook.
In any truly relativistic treatment of quantum mechanics, one must allow for particles to be created and destroyed. Trying to write down a normalized wave function for a single particle which evolves unitarily doesn't allow this, which is a first hint that the wave function formalism of the Dirac equation isn't a fully correct one. But using the Dirac equation does give accurate answers, so one should be able to figure out what approximation is being done.
The correct Hamiltonian for free Dirac fermions is the quantum field theory
$$
H = \int d^d x \, \Psi^{\dagger}_a(t,x) \left[ - i \hbar c \, \vec{\alpha} \cdot \vec{\nabla} + \beta m c^2 \right]_{ab} \Psi_b(t,x),
$$
where the $\beta_{ab}$ and $\alpha^i_{ab}$ (i = 1,2,3) matrices are the same appearing in Dirac's original paper, and
$$
\{ \Psi_a(t,x),\Psi^{\dagger}_b(t,x') \} = \delta_{a b} \delta^d(x - x').
$$
This is diagonalized by an expansion in creation and annihilation operators of the form
$$
\Psi_a(t,x) = \int \frac{d^d k}{(2 \pi)^d \sqrt{2 \omega_k}} \sum_s \left[ u_a(k,s) b_s(k) e^{- i \omega_k t + i k \cdot x} + v_a(k,s) c^{\dagger}_s(k) e^{i \omega_k t - i k \cdot x} \right],
$$
with anticommutation relations $\{ b_s(k),b_{s'}^{\dagger}(k') \} = \{ c_s(k),c^{\dagger}_{s'}(k') \} = \delta_{s s'} \delta(k - k')$, and by choosing the $u_a(k,s)$ and $v_a(k,s)$ judiciously we find
$$
H = E_0 + \int \frac{d^d k}{(2 \pi)^d} \sum_s \omega_k \left[ b^{\dagger}_s(k)b_s(k) + c_s^{\dagger}(k)c_s(k) \right].
$$
This Hamiltonian has its energy bounded below. One can define a conserved U(1) charge for this theory, and the particles created by the $c$ operators have opposite charge to the $b$ particles: they are antiparticles of each other. But they have the same positive energy! (At least when energy is measured relative to the vacuum energy $E_0$, which I'll ignore hereafter.)
How do we relate this to Dirac's original formalism? What we want is a one-particle approximation, where we track a single particle without allowing it to be created or destroyed - this is, after all, what Dirac's wave function is describing. In developing this, one is tempted to associate the state
$$
\Psi^{\dagger}(t,x)| 0 \rangle = |x \rangle
$$
as analogous to the position-space wave function of a single electron, but this does not work, since it is not normalizable. The problem is the $\Psi$ does not annihilate the vacuum - it also creates an antiparticle.
The solution leading to Dirac's theory is to define an electron vacuum by the relation
$$
\Psi(t,x) | 0 \mathbf{e} \rangle = 0.
$$
Without thinking, let's first assume we can construct the state $|0 \mathbf{e}\rangle$ satisfying this. Then constructing
$$
| \Psi_e \rangle = \int d^dx \, \psi_e(t,x) \Psi^{\dagger}(t,x) | 0 \mathbf{e} \rangle,
$$
we have the normalization $\langle \Psi_e| \Psi_e \rangle = 1$. More importantly, when we consider the properties of the function
$$
\psi_e(t,x) = \langle 0 \mathbf{e} | \Psi(t,x) | \Psi_e \rangle
$$
under time evolution, we find
$$
i \hbar \partial_t \psi_e(t,x) = \left[ - i \hbar c \, \vec{\alpha} \cdot \vec{\nabla} + \beta m c^2 \right] \psi_e(t,x).
$$
This is precisely the Dirac's one-particle wave function!
But what is this electron vacuum $| 0 \mathbf{e}\rangle$? By looking at its definition, it needs to be annihilated by not just the $b_s(k)$, but also by $c^{\dagger}_s(k)$ for all $k$ and $s$:
$$
\int \frac{d^d k}{(2 \pi)^d} \sum_s c^{\dagger}_s (k) | 0 \mathbf{e}\rangle= 0.
$$
It is pretty intuitive how to construct such a state since $(c^{\dagger}_s)^2 = 0$: we just fill up the entire (real) vacuum up with an infinite sea of antiparticles! That is,
$$
| 0 \mathbf{e}\rangle "=" \prod_{k,s} c^{\dagger}_s(k) |0 \rangle.
$$
Once one knows that this somewhat pathological limit is the "vacuum" you are working with in the one-particle theory, all of the "paradoxes" you saw when you studied the Dirac equation disappear. By looking at the original QFT Hamiltonian, this state has an infinitely positive energy compared with the actual QFT vacuum, as well as infinite charge. The one-particle states you obtain by taking $\Psi^{\dagger}| 0 \mathbf{e}\rangle$ either add a single electron to this "Dirac sea" of antiparticles, increasing the energy, or they remove a single positron from the sea, which only gives a negative energy relative to the infinitely positive energy you've already added to the system. (When compared to the ground state of the original QFT Hamiltonian, the energy is still (infinitely) positive).
Note that the negative energy states you obtain in this theory should not be interpreted as antiparticles, since they carry the same charge as the $b$ particles. These states involve removing one of the antiparticles in the filled Dirac sea. If one wants to deal with an antiparticle wave function, you should define a "positron vacuum" by $\Psi^{\dagger}(t,x)| 0 \mathbf{p}\rangle = 0$ and work with that instead. (As an exercise, couple the U(1) Noether current of the original Dirac QFT to a gauge field, and keep track of the relative sign that $\psi_e$ and $\psi_p$ couple to it.)
In working in this approximation, one doesn't allow any transitions to states with a different number of particles. Also, operators will generically couple the one-electron and one-hole states as time evolves, even at low energies (you can think of annihilating the electron with one of the positrons in the Dirac sea). So the theory is necessarily approximate, and you should go back to the QFT for the full picture.