A photon is a point-sized particle whose position is described by an extended, wave packet-shaped probability distribution. This means that there is a degree of limited information in the position, which stands at a trade-off relationship with the information in the momentum, so that if you have a situation in which you can attribute full information to the momentum, there must be zero information in the position, i.e. what you are talking about with a wave plane. If, however, you admit partial information about both, you can have a localized wave packet, but interestingly, inevitably it can never be perfectly localized, i.e. there must always be some "variance" in the position distribution.
This is the part that is often missed in these "is it a particle or a wave/field/etc." back-and-forths: position is a relationship of an object versus other objects or else to a fictitious coordinate grid. It is not the same as shape of that object. In fact, we can apply quantum mechanics to hypothetical hard spheres just as we can apply Newtonian mechanics to them, though there's no evidence real-life particles are, in fact, hard spheres of a finite size. And what happens is the information describing this relationship becomes restricted in quantum mechanics.
In terms of the EM field, you can think of it as like a single point where the E/B field is not definitely zero, but the spatial location of this point is underdetermined in the same way that the position of a particle in ordinary QM is underdetermined. That is, the EM field for a 1-photon state is a superposition of configurations with a single excited point, weighted by the probability requirements corresponding to the photon position-space wave function.
The impossibility of perfect localization can be understood by comparing to the analogous situation with a "phonon" (note the "n") in a finite crystal lattice. A phonon is to sound what photons are to light, hence the name (as in "telephone", or "phonograph", and the like). There, one can derive that, specifically when the lattice is finite, while one can construct pointlike excitations of arbitrarily good positional localization, they only "count", in the sense that the number operator "recognizes" them as single-phonon states, as single phonons so long as you don't make them "too localized", which basically means you don't add up too many Fourier modes (how many you can add up depends on the lattice size, with more naturally becoming possible the larger the lattice gets and thus "approximates infinity") to make the $\psi_x$ wave packet that you're talking about. And in this paper:
https://arxiv.org/abs/math-ph/0607044
it is described how that for relativistic quantum fields, which the EM field is an example of, the situation is directly analogous to the finite crystal lattice case. In effect, try to localize a phonon or photon "too well", and it transforms into multiple phonons or photons or, even more accurately, a fuzzy (in the same sense of lacking information as in the position) amount of such.
Indeed, this doesn't just apply to photons, but to all relativistic particles including even electrons, where the scale of maximal localization is the Compton wavelength, $\frac{h}{mc}$, about 2.4 pm for an electron (compare against the diameter of a $\mathrm{H}$ atom, about 106 pm). Note again, this is not the "size" of the electron, only the scale of maximal compression of the wave function of position, or if you like, the absolute maximum amount of "precision" that can be put into the position "variable". The shape or avatar of the electron is unchanged.