How do we compute $v_{\rm drift}$?
I will not reproduce the math of Farcher's answer. Instead, I'll try to spell out the interpretation a little more clearly.
What we want is the average of the velocity of all of the electrons at some moment in time. Call the velocity at time $t$ of one electron $\vec{v}(t)$. We want to compute $\langle \vec{v}(t)\rangle$. In fact, by definition, $\vec{v}_{\rm drift}=\langle \vec{v}(t)\rangle$. (Note that I've chosen notation $\vec{v}(t)$ to make it clear we are evaluating the velocity of all electrons at the same time $t$, but the average is time-independent).
Now, for each electron, we can write $\vec{v}(t) = \vec{v}_0 + \frac{e \vec{E}}{m} (t-t_0)$, where $\vec{v}_0$ is the velocity at last time the electron had a collision, and $t_0$ is the time of the last collision. Here, $t_0$ and $\vec{v}_0$ are random variables; they are different for every electron. Now, $\langle \vec{v}_0\rangle = 0$ (see the second section of this answer for an argument). So, what we need to compute $\vec{v}_{\rm drift} = \langle \vec{v}(t)\rangle$, is to determine the distribution of $t_0$.
Now, it is admittedly confusing that Farcher's answer is phrased in terms of the time between collisions, rather than considering the average velocity of the electrons at a single time. However, the distribution of $t-t_0$ (the amount of time that has passed since each electron had its last collision) is the same distribution$^\dagger$ as the distribution of $\tau$, the time between collisions for an electron. Therefore, $\langle t-t_0\rangle = \langle \tau \rangle = \bar{\tau}$, where $\bar{\tau}$ is the mean free time.$^\star$ The calculation of this average is explained in Farcher's answer very nicely. Note that not all electrons will take a time $\bar{\tau}$ to collide; some will collide faster, and some slower, but on average they will collide on time $\bar{\tau}$.
Given this, we can say
\begin{equation}
\vec{v}_{\rm drift} = \langle \vec{v}(t) \rangle = \langle \vec{v}_0 \rangle + \frac{e\vec{E}}{m}\langle t - t_0 \rangle = \frac{e \vec{E} \bar{\tau}}{m}
\end{equation}
$^\dagger$ More precisely, the Poisson distribution.
$^\star$ There is a subtlety here, which is described in Ashcroft & Mermin problem 1.1 (thanks to @Puk for pointing to that question in the comments). The average $\langle t-t_0\rangle$ is the average time since the last collision averaging over all electrons (an ensemble average), while the average $\langle \tau \rangle$ is the average between successive collisions of one electron (a time average). Often, in physics, time and ensemble averages are related by the ergodic theorem. Interestingly (and counterintuitively, at least to me), it turns out that at any given time $t$, the time between the last and next collision averaged over all electrons is $2\bar\tau$.
Why is $\langle v_{\rm init}\rangle = 0$?
This paragraph was my original answer; I misunderstood the question so this is not directly relevant, but I am leaving it here.
Since the collision takes place on a time scale very short compared to the time scale over which the field imparts velocity to the particle, $m/eE$, the collision causes the particle to scatter into a random momentum state. Then, the electric field interacts with the particle and causes it to gain velocity in the direction of the field. So the final velocity $v_{final}$ (just before the next collision) has some random component $v_{initial}$, which has an average of zero, and a deterministic component, $eEt/m$, which represents the effect of the field and leads to velocity building up in the direction of $E$.
At the risk of anthropomorphizing electrons: immediately after the collision the electron loses track of where it was going, but the applied field reminds it to keep moving in the right direction.