The source is on for a time $t$. In that time, the source sees $f_st$ peaks, and the observer sees $f_ot$ peaks. My intuition also says they should see the same number, but using the different frequencies to find how many peaks are observed leads to a disagreement! I feel like I'm mistaken in applying the formula $ft=n$ here, but why?
The issue is in assuming that the time $t$ that is experienced by the source and the observer is the same. This is not the case.
This is because as the source moves away from the observer, the sound wave will be longer by an amount $vT_s$, where $v$ is the velocity of the source, and $T_s$ is the time the source is on. The total length of the wave will be $\ell=(c+v)T_s$, where $c$ is the speed of sound.
Since the wave moves at the speed of sound $c$, the observer will experience the sound for a time
$$T_o=\frac{\ell}{c}=\frac{(c+v)T_s}{c}$$
Now, applying the Doppler shift in frequency for a source moving away from a stationary receiver:
$$f_o=\frac{c}{c+v}f_s$$
we find that
$$f_oT_o=f_sT_s=n$$
which is what you correctly assumed should be true.
Note: you could also view this as a way (maybe even the way?) to derive the Doppler shift formula, where you start with $f_oT_o=f_sT_s=n$ and determine how long the receiver experiences the emitted wave as was done above.