A functional derivative is a natural continuation of the concept of a multivariable derivative to an infinite-dimensional setting. Recall that in multivariable calculus, the gradient operator is defined so that
\begin{align*}
f(\vec x+\vec h)\approx f(\vec x) +
\vec h\cdot \nabla f(\vec x)+\mathcal O(\|\vec h\|^2).
\end{align*}
By analogy, the functional derivative $\delta$ is such that $F[g+h]\approx F[g] + \langle \delta F[g],h\rangle$, or
\begin{align*}
F[g+h]\approx F[g] + \int\delta F[g](x)h(x)d\mu(x)
\end{align*}
(where $\mu$ is the underlying measure.)
Calculations can be performed on the basis of this definition. For example, consider the 'evaluation functional', $F_x[\cdot]$ say, that evaluates its argument at $x$. Then $F_x[J]=J(x)$, and its functional derivative $\delta F_x[J](y)\equiv K(y)$ satisfies
\begin{align*}
\int K(y) h(y) d\mu(y) &\approx F_x[J+h]-F_x[J]\\
&=J(x) + h(x) - J(x)\\
&=h(x).
\end{align*}
As this holds for all values of $x$, we can recognize that $\delta F_x[J](y)$ behaves exactly like $\delta(x-y)$.
[In the second example, the expression $J(y_1)\Delta(y_1-z_1)J(z_1)$ is evaluated at two points, and so the functional derivative would need to either be restricted to one argument, either $y_1$ or $z_1$, producing something like $\delta(x-y_1)\Delta(y_1-z_1)J(z_1)$, or apply to both simultaneously, in which case the result would have the form $\Delta(y_1-z_1)\delta(x-y_1)\delta(x'-z_1)$. Note that the result is only tangentially related to what you would obtain by taking the functional derivative with respect to $J(x)$ of $F_\Delta[J]\equiv\int d\mu(y_1)d\mu(z_1)\big[J(y_1)\Delta(y_1-z_1)J(z_1)\big]$ which yields $2\int d\mu(y_1)J(y_1)\Delta(y_1-x)$, or perhaps more accurately $\int d\mu(y_1)J(y_1)(\Delta(y_1-x)+\Delta(x-y_1))$, as can be checked by comparing with the linear term in the evaluation of $F_\Delta[J+\eta]$. Taking a second derivative yields $2\Delta(x-x')$, or $\Delta(x-x') + \Delta(x'-x)$, depending on the symmetries of $\Delta$.]
The examples that we encounter in QFT are somewhat more complicated, but nonetheless can be approached using the standard technique of renormalized perturbation theory with Feynman diagrams. This is often performed with respect to a finite (discretized, finite volume) 'regularized' theory whose calculations should approach a well-defined limit as the lattice spacing approaches zero and the volume approaches infinity (keeping in mind that the lattice spacing is itself typically measured in terms of the values of the fields themselves.)
In cases relevant to the vast majority of graduate students, the most important punchline of the finite regularized limiting definition is that all the standard tricks of calculus can be used at a formal level when introducing and evaluating perturbative expansions. For example, it can otherwise be somewhat difficult to approach even "seemingly innocuous" expressions such as $\exp(-\frac{\lambda}{4!}\int\phi^4)$ which may appear under a path integral, given that the properties of typical $\phi$ instances, which may be somewhat erratic, are not known a priori, and may even call into question the meaning of expressions such as $\phi^4$ (which as it turns out is worth worrying about.) For an uncomfortably long time, a non-trivial danger to the perturbative QFT enterprise was the possibility that non-trivial field theories simply didn't exist. This concern has been allayed to some extent through agreement between the standard model and experimental results at CERN, as well as a somewhat heroic accomplishment of Arthur Jaffe and others in proving the existence of non-trivial quantum field theories.