Akhmeteli is absolutely spot on. Since he/she has thrust to the problem's very heart with his / her keen dagger, just a few more words to clean up the lacerations on the wound!
It IS all about z-axis shift invariance: so this symmetry requires the existence of solutions which are eigenfunctions of $\partial_z$, namely functions of the form $f(x,y) \exp(i\,k_z\,z)$. Note that this form gainsays your assertion that the spatial component is a function of $x$ and $y$ only: there is the $\exp(i\,k_z\,z)$ as well. As you see, the $z$ variation is so simple that you can often overlook it (I have made this very mistake many times myself).
To understand the solution to this problem at a practical level, imagine we are solving this problem numerically and represent the transverse field as some numerical grid: we can rewrite a 2D grid into a 1D column vector $\psi$ with $N = n\times m$ complex components where the transverse 2D grid resolution is $n$ points by $m$ points. Maxwell’s equations are linear so that the action of a length $z$ of waveguide can be represented by a big square $N\times N$ matrix $U(z)$, i.e. $\psi \mapsto U\,\psi$. Now we think about concatenating lengths of waveguide. Owing to our translational invariance, we have:
$$U(z_1)\,U(z_2) = U(z_2)\,U(z_1) = U(z_1 + z_2)$$
for any lengths $z_1,\,z_2\in\mathbb{R}$, and the only continuous solution to this functional equation is $U(z) = \exp(H\,z)$, for some constant square $N\times N$ matrix $H$. Now if we set up our column vector right, its $\ell^2$-length (i.e. its norm) $\sqrt{\psi^\dagger\psi}$ is proportional to the total propagating power through any transverse cross section of the waveguide. If the waveguide is lossless, this norm cannot change, so that $U(z)$ is also unitary (preserves lengths of and angles between vectors) and this can be shown to be equivalent to the condition $H^\dagger = -H$, i.e. $H$ is skew Hermitian. The spectral theory for finite normal operators (i.e. ones that commute with their adjoint, so square skew Hermitian matrices are such operators) is very well studied and easy: such a matrix can always be diagonalised by a unitary matrix of $N$ orthonormal column vectors, and so the matrix exponential $U(z) = \exp(H\,z)$ can be expressed in the form:
$$U(z) = \Omega \, \operatorname{diag}[e^{i\,k_1\,z}, \, e^{i\,k_2\,z}, \, e^{i\,k_3\,z}, \,\cdots] \Omega^\dagger$$
where now the $ e^{i\,k_1\,z}$ are all scalar exponentials and the $k_j$ are all real (this together with the unitary decomposition is equivalent to skew-Hermitianhood of $H$). So now think what the eigenvectors – the columns of $\Omega$ mean, given that this is a numerical problem. They stand for certain special field configurations in the transverse plane, i.e. they are discretised versions of the eigenfunction variation $f_j(x, y)$. So, through this spectral factorization, you can see all numerical solutions to the problem are linear superpositions of the kind:
$$\sum_j\alpha_j\,f_j(x,\,y)\, \exp(i\,k_j\,z)$$
(for the numerical problem, you can think of $x$ and $y$ as discrete grid indices passing to continuous transverse co-ordinates in the limit and the $f_j$ are our "grid" eigenfields) and hopefully you can accept that this numerical representation will approach the real one as the grid spacing gets smaller and smaller, so the above is an explanation for the kinds of solutions you are asking about. The numerical solution of this problem seldom works like this - $N = n\times m$ is huge (think of a problem with a $1024\times1024$ transverse discretization) and there are much more efficient ways of representing the action of propagation through an axial step $\Delta z$ than to work out a matrix $U(\Delta z)$ or $H$ (which would hold $10^{12}$ complex numbers in the $1024\times1024$ grid example!), but you can see the above is an in-principle description: there is a square unitary grid operator and it can always be factorized in the way I describe. So the numerical procedure is always equivalent to the above: one just finds clever algorithms that are equivalent in their action but do not need to build all of the huge $U(\Delta z)$ in memory at once. Finite difference methods, for example, assume, accurately, that over small steps, the field at a point in the updated grid is only influenced by the fields at neighbouring points (for first order PDE problems) or at neighbouring and next-to-neighbouring points (for second order PDE problems, as Maxwell's equations are). So we only need to represent in memory the leading diagonal together with two neighbouring stripes of the matrices $H$ and $U(z)$: this is roughly $5\times 1024\times 1024$ points in my example, or about $16\times 5 = 80$ megabytes on a 64-bit machine representing complex numbers as two one-word (8 bytes) reals, i.e. readily workable in today's technology. The co-efficients in the $H$ matrix come straight from discretized versions of Maxwell's equations and tridiagonal (three stripe) pentadiagonal (five stripe) matrix factorization numerical procedures are very well worked out, tested and numerically sound.
Incidentally, the above discussion is wholly analogous to the theory behind the general Schrödinger picture in quantum mechanics: see my answers to the Physics SE questions A Simple Explanation for the Schrödinger Equation and Model of Atom?, Reference frame involved in the Schrödinger's equation and Why can the Schroedinger equation be used with a time-dependent Hamiltonian?.
In non-numerical simulation words: in a lossless waveguide, the $z$-shift operation is indeed unitary (no energy loss) so the "infinitesimal" shift $\partial_z$ is skew-Hermitian i.e. the eigenvalues $i\,k_z$ are purely imaginary. You need a bit of theory - a vector analogue of Sturm-Liouville theory - to see why such eigenfunctions can be added together to match arbitrary boundary conditions on a transverse cross-section. Actually, given the vector nature and further conditions of solenoidal fields ($\nabla\cdot\vec{E}=\nabla\cdot\vec{H}=0$), there's a bit more to a proof than standard Sturm-Liouville theory. Standard SL theory will work of course for two out of three Cartesian components of the $\vec{E}$ or $\vec{H}$ field (or Cartesian component of the Lorenz-gauged potential four-vector). For a metal waveguide with a compact cross-section, the spectrum of eigenfunctions is wholly discrete: for an optical fibre, there is a continuous component to the spectrum - the radiation field - as well as the discrete, losslessly guided bound eigenfields.
The full story is to be found in Snyder and Love, "Optical Waveguide Theory", particularly chapters 30 and 31 for groundings and earlier chapters for the radiation field of optical waveguides (I understand from John Love that the ANU will make this book freely downloadable sometime) and R. E. Collins, "Field Theory of Guided Waves". However, neither of these books is particularly up front about the mathematical foundations of the "completeness" of the vector eigenfunctions and I don't know of any reference that discusses this idea in simple, modern terms.