Quantum mechanics and quantum field theory are different in how they treat their wave equations. The usage of the common term “propagator” could be traced back to the “relativistic wave equation” approach—i. e. people really used to think of the Schrödinger and the KG operators as belonging to the same class of “quantum operators”, but the modern point of view regards these things as being of different nature, so I suggest you do too at first. (Later, you might want to understand the Schrödinger field in non-relativistic QFT by reading chapter III.5 of Zee, and if you’re feeling brave, the origins of modern QFT, described in Weinberg’s first volume, section 1.2.) Accordingly, I will divide my answer into sections on QFT and QM.
Quantum mechanics. Assume you know the transition amplitude
$\def\xi{x_{\mathrm i}} \def\xf{x_{\mathrm f}} \def\ti{t_{\mathrm i}} \def\tf{t_{\mathrm f}}$
$$K(\xf,\tf;\xi,\ti) \equiv \langle \xf,\tf|\xi,\ti\rangle$$
and the wave function $\psi(x,\ti) = \psi_0(x)$ for all $x$ at a certain time $t =\ti$. Then you also know it at any other time $t=\tf$:
\begin{multline}
\psi(\xf,\tf) \equiv \langle \xf,\tf|\psi(\ti)\rangle = \langle \xf,\tf|\left(\int d^n\xi\,|\xi,\ti\rangle\langle \xi,\ti|\right)|\psi(\ti)\rangle \\\equiv \int d^n\xi\,K(\xf,\tf;\xi,\ti)\psi_0(\xi).\tag{1}
\end{multline}
The first few sections of Feynman and Hibbs or chapter 6 of SrednickiPDF should convince you that
$$K(\xf,\tf;\xi,\ti) = \int\limits_{x\rlap{(\ti)=\xi}}^{x\rlap{(\tf)=\xf}} \mathcal Dx(t)\,e^{i\int dt\,L(x(t),\dot x(t),t)}.$$
Note well the boundary conditions in the path integral: they will prove to be important in the QFT section.
Let us rearrange the arguments, $K(\xf,\tf;\xi,\ti) = K(\xf,\xi;\tf,\ti)$. Then you will be able to recognize in (1) an integral representation of the evolution operator $U(\tf,\ti)$,
$$ \psi(\xf,\tf)\equiv(U(\tf,\ti)\psi_0)(\xf) = \int d^n\xi\,K(\xf,\xi;\tf,\ti)\psi_0(\xi).$$
If you think of $\xi$ and $\xf$ as indices with continuous number of values, this formula looks very much like matrix multiplication and $K(\cdot,\cdot\,;\tf,\ti)$ plays the role of the matrix. This makes sense, because the linear operator $U(\tf,\ti)$ should be represented by (something like) a matrix! Mathematicians call that something an (integral) kernel, hence the $K$. But it really is a very big matrix, barring pathologies; speaking of which, convince yourself that the Dirac delta “function” $\delta^n(\xf-\xi)$ is the kernel of the identity transform and that $K(\xf,\xi;\ti,\ti) = \delta^n(\xf-\xi)$.
Armed with the knowledge that Dirac delta is in fact the identity operator, you now see that the definition of the Green’s function (more properly the fundamental solution) of a linear differential operator $L$, limited to zero space dimensions and no explicit $t$ dependence for simplicity,
$$ LG = \delta(t), $$
is in fact just the definition of an inverse! Given $G$, it’s also obvious how to solve any other inhomogeneous equation:
$$ Lu = f(t)\quad\Leftarrow\quad u(t) = \int ds\,G(t-s)f(s). $$
But what does it all have to do with the solution to the boundary value problem that is the propagator $K$? Everything, it turns out, according to Duhamel’s principle. The Green’s function $G$ (for the inhomogeneous problem) and the propagator $K$ (for the initial value problem) are in fact the same! A discussion at Math.SE provides some motivation, and Wikipedia has the details on handling equations that are of more than first order in time (e. g. KG not Schrödinger). In any case, the end result is that $K$ above is the inverse of the Schrödinger operator,
$$[\partial_t + iH(x,-i\partial_x)]K = \delta(t)\delta^n(x).$$
Interlude. You might enjoy reading section 2 of Feynman’s classic paper Theory of positronsPDF, Phys. Rev. 76, 749 (1949), and the beginning of section 2 of the follow-up Space-time approach to quantum electrodynamicsPDF, Phys. Rev. 76, 769 (1949), which provides the link between the QM and QFT approaches by showing how to write a perturbation expansion in $g$ for a Hamiltonian $H = T + gV$ when you can determine the exact evolution under the “kinetic” part $T$ but not the “interaction” part $gV$, $g \ll 1$. The first-order contribution, for example, ends up looking like
$$ K_1(\xf,\tf;\xi,\ti) = -ig\int_{\ti}^{\tf} dt\int d^3x\, K_0(\xf,\tf;x,t)V(x,t)K_0(x,t;\xi,\ti), $$
which can reasonably be described as “propagating to an arbitrary point $x$, scattering off the potential and propagating to the final point from there”. The second paper has the extension to multi-particle systems.
Feynman used this to motivate, for the very first time, his diagrams. The part pertaining to QED itself should be taken with a grain of salt, however, for the reasons stated in the first paragraph. You’d have a lot of fun, for example, explaining why the restriction $\ti\le t\le \tf$ is not enforced in QFT—Feynman called this the reason for antiparticles.
Quantum field theory. Vernacular (as opposed to axiomatic) quantum field theory starts with a classical field equation. That is what your KG or Dirac or wave equation is: a classical equation derived from a classical action for the field. You can split the equation and the action into a “free” and an “interaction” part; the free (or “kinetic”) part is usually defined as the part you’re able to solve exactly—the linear part of the equation, the quadratic part of the action. The free propagator is then the inverse of that part. It is usually called $D$ for fermionic and $\Delta$ for bosonic fields, although conventions (and coefficients!) vary.
Promote the fields to operators $\hat\phi(x)$, using canonical quantization; after some pain and suffering you’ll find the totally mysterious fact that, in the free theory,
$$ \langle0|\mathcal T\hat\phi(x)\hat\phi(y)|0\rangle = \theta(x^0-y^0)\langle0|\hat\phi(x)\hat\phi(y)|0\rangle +\theta(y^0-x^0)\langle0|\hat\phi(y)\hat\phi(x)|0\rangle =\frac 1i \Delta(x-y), $$
where $|0\rangle$ is the ground state, and the first equality serves to define the $\mathcal T$ symbol, the time ordering. However, in the land of functional integrals, this whole thing is as easy as solving the quadratic equation $ax^2 + bx + c = 0$ by completing the square; you can find the details in Zee chapter I.2, starting with equation (19). The result is
$$ \frac 1i \Delta(x-y) = \langle0|\mathcal T\hat\phi(x)\hat\phi(y)|0\rangle \equiv \int\mathcal D\phi(x)\,\phi(x)\phi(y)\,e^{iS[\phi]} = \left.\frac\delta{i\,\delta J(x)}\frac\delta{i\,\delta J(y)}\int\mathcal D\phi(x)\,e^{i\left(S[\phi] + \int d^4x\,J(x)\phi(x)\right)}\right|_{J = 0}, $$
with the equivalence in the middle being nearly the definition of the integral, and the whole thing should look reasonable and not coincidentially reminiscent of statistical physics. Note how the integration is over the four-dimesional field configurations $\phi(x)$ instead of particle paths $x(t)$: QM is just QFT in one dimension!
You have to derive the path integral to understand where the $\mathcal T$ comes from—however, it makes sense that if the path integral defines correlators, they should come with an ordering prescription: under the integral sign, there are no operators, only numbers, and no ordering. The derivation will also convince you that (remember how I told you to mind the boundary conditions?)
$$ \int\mathcal Dx(t) \equiv \int d^n\xf\,d^n\xi \langle0|\xf\rangle \langle\xi|0\rangle \int\limits_{x\rlap{(\ti) =\xi}}^{x\rlap{(\tf) =\xf}}\mathcal Dx(t) $$
for arbitrary $\ti$ and $\tf$ that encompass all the time values you are interested in, in one dimension for simplicity. I recently had to write down the details so you can consult my notes to selfPDF if necessary.
The final leap is to introduce interactions; I’ll leave that for the AMS notes or Zee chapter I.7, but the idea is again (functionally) differentiating under the (functional) integral:
$$
\int\mathcal D\phi(x) e^{i\left(S[\phi] + I[\phi] + \int d^4x\,J(x)\phi(x)\right)}
= e^{iI\left[\frac{\delta}{i\,\delta J}\right]} \int\mathcal D\phi(x) e^{i\left(S[\phi] + \int d^4x\,J(x)\phi(x)\right)}
$$
and the result is vertices in Feynman diagrams.