Background
Here we work in the Euclidean theory throughout. I also preface this with a disclaimer that I have been a bit lax with indices, but hopefully the message remains clear.
The Ward identities in their broadest form possible are actually a statement about any infinitesimal shift in the fields, not necessarily symmetries of the action or otherwise:
$$
\delta\phi=\epsilon\cdot f\qquad[\delta\phi_i=\epsilon^r(x^\mu) f_r^i(\phi, \partial\phi)]
\\Z=\int\mathcal D\phi'\exp\left(-S'+\int J\cdot\phi'\right)
\\=\int\mathcal D\phi\left(1+\epsilon^r\mathrm{Tr}\frac{\delta f_r}{\delta\phi_j}\right)\exp\left(-S-\delta_\epsilon S+\int J\cdot(\phi+\epsilon\cdot f)\right)
$$
$$
\\\Rightarrow\left\langle-\delta_\epsilon S+\int \epsilon J\cdot f\right\rangle=-\left\langle\epsilon^r\mathrm{Tr}\frac{\delta f_r}{\delta\phi_j}\right\rangle\tag{1}
$$
Some people call this the Ward identity, but it's far too general to be of any immediate use. Note that $J$ is not the analogue of the classical Noether current: it's simply a background source. Let me stress again: this holds for all transformations $\delta_\epsilon$.
As you have probably seen in the derivation of Noether's first theorem, any symmetry of the action must have
$$
\delta_\epsilon S=\int\delta_\epsilon\mathcal L=-\int\epsilon\cdot\ \partial_\mu\left(f\cdot\frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}\right)=-\int\epsilon\cdot\ \partial_\mu j^\mu
$$
with the appropriate boundary conditions on $\epsilon(x)$. Here $j^\mu(x^\mu)$ is the classically conserved Noether current.
Now for a symmetry in the quantum theory, this means that
$$
Z=\int\mathcal D\phi'\exp\left(-S'+\int J\cdot\phi'\right)
\\=\int\mathcal D\phi\left(1+\epsilon^r\mathrm{Tr}\frac{\delta f_r}{\delta\phi_j}\right)\exp\left(-S+\int J\cdot\phi\right)\left(1-\int\epsilon\cdot(\partial_\mu j^\mu-J\cdot f)\right)
$$
$$
\Rightarrow\partial_\mu\langle j^\mu\rangle-\left\langle J\cdot f\right\rangle=\left\langle\mathrm{Tr}\frac{\delta f^i}{\delta\phi_j}\right\rangle\tag{2}
$$
This is what is usually called the Ward identity. The right-hand side of the equation denotes the anomaly from the path integral measure, and is zero in case of a non-anomalous transformation (it is also a very formal expression, and actually requires quite a large toolset to evaluate in most cases). Also recall that the source $J$ is literally there for us to play around with, so we can take derivatives with respect to it before setting it to zero in order to derive the Ward identities for correlators: taking a non-anomalous transformation for simplicity,
$$
0=\int\mathcal D\phi\exp\left(-S+\int J\cdot\phi\right)\left(\partial_\mu j^\mu-J\cdot f\right)
\\ J=0\Rightarrow\partial_\mu\langle j^\mu\rangle=0
\\-i\frac{\delta}{\delta J(x_1)}Z\bigg|_{J=0}=0\Rightarrow\partial_\mu \langle j^\mu(x)\phi(x_1)\rangle=\delta(x-x_1)\langle f(\phi)\rangle
$$
and so on, to derive the relations between the correlators.
Another important observation is that for linear global symmetries, equation $(2)$ is equivalent to
$$
\epsilon^r\partial_\mu\langle j^\mu\rangle=\delta_\epsilon\Gamma[\varphi]
$$
where $\Gamma[\varphi]$ is the 1PI effective action. This also formalises the idea of "classical symmetries carrying over to the quantum theory": the 1PI effective action will normally have the same symmetries as the classical theory - this violation is measured by the anomaly. This, however, does not work for gauge theories, since we need to gauge-fix the action under the path integral. You could of course of course take the BRST route, but a simpler method is to consider a different sort of effective action - one where the gauge fields are integrated out into the background, say $W[A']$, which I will discuss qualitatively. The variation of this object under $\delta_\epsilon$ correctly gives the anomaly. Finally, one writes the conserved current in the presence of the background fields as
$$
\frac{\delta W[A]}{\delta A^a_\mu}=\langle J_a^\mu(x)\rangle
$$
whereupon
$$
D_\mu\langle J_a^\mu(x)\rangle=\left\langle\mathrm{Tr}\frac{\delta f^i}{\delta\phi_j}\right\rangle
$$
is the Slavnov-Taylor identity for non-abelian gauge symmetries with an anomaly.
Redux
- As mentioned, the Ward identities hold for all symmetries of the action. So the Ward-Takahashi identities hold for global symmetries too, although the trick of "promoting" a global symmetry to a local one (as is done for using Noether's theorem in a classical field theory) and dropping spacetime dependence right at the end is a quick and easy way to derive the identity for a given theory - this does not qualify as a gauge transformation because we do not take the gauge field itself to transform. Also note that in the case when the operators in the theory only change via a replacement $\phi\to\phi'$, then the global Ward identities are particularly simple:$\langle\mathcal O_1(\phi(x_1))\dots\mathcal O_n(\phi(x_n))\rangle=\langle\mathcal O_1(\phi'(x_1))\dots\mathcal O_n(\phi'(x_n))\rangle$
- Since the Ward identities are in terms of exact correlators, they hold to all orders in perturbation theory: you can also write out the identities in terms of the effective action $W[J,\{\Phi\}]$ with background gauge fields $\{\Phi\}$, which generates relations between all of the connected Green's functions.
- Yes, this uses Noether's first theorem, as shown. Of course, $\partial_\mu j^\mu=0$ does not imply $\partial_\mu\langle j^\mu\rangle=0$ unless the path integral measure is invariant. However, recall that even local transformations of the fields that are symmetries of the action generate an on-shell conserved current when the corresponding infinitesimal parameters $\epsilon^r(x^\mu)$ are made constant.
- OP has summarised this well themselves: the anomaly measures the extent of the violation of the naïvely expected Ward identity, as also shown above. For example, we would naïvely expect $\partial_\mu \langle j^\mu_A\rangle=0$ for the axial current, but the path integral measure transforms as
$$
\int\mathcal D\psi\mathcal D\bar\psi\rightarrow\int\mathcal D\psi\mathcal D\bar\psi\exp\left(-\frac{ie^2}{16\pi^2}\int\epsilon \ F_{\mu\nu}\tilde F^{\mu\nu}\right)
$$
and so
$$
\partial_\mu \langle j^\mu_A\rangle=\frac{e^2}{16\pi^2} F_{\mu\nu}\tilde F^{\mu\nu}
$$
which is the chiral anomaly. This is an example of a global anomaly, which is entirely harmless, but not entirely useless: it helps in e.g. the predicting $\pi^0\to\gamma\gamma$ decay rate. On the other hand, gauge anomalies ought to be cancelled otherwise, roughly, the lack of gauge invariance will prevent us from removing the negative norm states while trying to restrict to a well-defined Hilbert space for our states. These cancellation conditions lead to very important consistency requirements on the theory - a wild example is determining the gauge groups for the type I and heterotic string theories.