In the Exact Renormalization Group formalism, specifically the formalism of Wetterich, one writes down an evolution equation for the effective average action $\Gamma_k[\varphi]$, see f.ex $$ \partial_t\Gamma_k = \frac{1}{2} \mathrm{Tr} \left[ \left( \Gamma^{(2)}_k + R_k \right)^{-1} \partial_t R_k \right], $$ where $R_k(q)$ is a cutoff, the trace sums over all d.o.f., $t = \ln k$ and $\Gamma^{(2)}_k$ is the second functional derivative of $\Gamma_k$.
If one does not want to do regular perturbation theory in the strength of the non-linear coupling, one needs a different expansion parameter. A common is momentum, that is, expanding $\Gamma_k[\varphi]$ in derivatives of the field $\varphi$. For an $\mathrm O (n)$ model, this is done by writing (see also) $$ \Gamma_k[\varphi] = \int \mathrm d x \, \left\{ U_k(\varphi^2) + \frac 1 2 Z_k(\varphi^2) \partial_\mu \varphi_i \partial_\mu \varphi_i \right\} + \mathcal O ( \partial^4). $$
Now, my question is, why do we only include local terms, that is, terms that can be written as a single integral over space? Why are no terms like $\int \mathrm d x \mathrm d y \varphi_i(x) \varphi_i(y)$ necessary to include? As Schwartz [1] reminds us, there is no guarantee that horrible non-local terms not expressable in terms of a single integrals should not appear in the effective action $\Gamma = \Gamma_{k\rightarrow0}$. I have seen it argued that this is due to some "quasi-locality" property, however, I do not understand this, and help clarifying it would be greatly appreciated. Is this one of those approximations where we cross our fingers and check the result afterward, or can there be given a systematic justification for it?
[1]: M.D, Schwartz, Quantum Field Theory and the Standard Model, p. 736