Define the translation operator
$$
T(a)\equiv\exp\left(-iP^{\mu}a_{\mu}\right)\ ,
$$
where $a_{\mu}$ are transformation parameters and $P^\mu$ are the generators of translations (which turn out to be the linear momentum operator). Then, a scalar field $\phi(x)$ transforms under translations as (remember quantum mechanics)
$$
T(a)\phi(x)T^{-1}(a)=\phi(x-a)\ . \quad(\star)
$$
Now, consider an infinitesimal translation by a small parameter $\epsilon_\mu$. Then the LHS of $(\star)$ becomes
\begin{align*}
T(\epsilon)\phi(x)T^{-1}(\epsilon)&=\exp\left(-iP^{\mu}\epsilon_{\mu}\right)\phi(x)\exp\left(iP^{\nu}\epsilon_{\nu}\right)\\
&=\left(1-iP^{\mu}\epsilon_{\mu}\right)\phi(x)\left(1+iP^{\nu}\epsilon_{\nu}\right)\\
&=\phi(x)-i\epsilon^{\mu}\left[P_{\mu},\phi(x)\right]+\mathcal{O}\left(\epsilon^2\right)\ ,\quad (1)
\end{align*}
where from the first to the second line we have used the fact that $\epsilon$ is infinitesimal and from the second to the third line we have done a few index relabels. The RHS of $(\star)$ becomes
$$
\phi(x-\epsilon)=\phi(x)-\epsilon^{\mu}\partial_{\mu}\phi(x)+\mathcal{O}\left(\epsilon^2\right)\ .\quad (2)
$$
Then, setting (1) = (2) and keeping only terms linear in $\epsilon$ we get
\begin{gather}
\phi(x)-i\epsilon^{\mu}\left[P_{\mu},\phi(x)\right]=\phi(x)-\epsilon^{\mu}\partial_{\mu}\phi(x)\ ,
\end{gather}
which after a trivial manipulation yields the desired result
$$
\boxed{\left[P_{\mu},\phi(x)\right]=-i\partial_{\mu}\phi(x)}\ .
$$
Please let me know if you still have doubts! Cheers.
EDIT: A nice reference to understand how fields transform under Lorentz transformations is Tong's lecture notes on QFT. On this website you can also watch a few of his lectures where he talks about these transformations (I think somewhere in the first three videos).
Extra
In quantum mechanics, consider operators $\hat{\mathcal{O}}(\hat{x})$ and states $|x\rangle$, such that
$$
\hat{\mathcal{O}}(\hat{x})|x\rangle=\mathcal{O}(x)|x\rangle\ .
$$
A trivial example is $\hat{\mathcal{O}}(\hat{x})=\hat{x}$ such that
$$
\hat{\mathcal{O}}(\hat{x})|x\rangle=\hat{x}|x\rangle=x|x\rangle\ .
$$
Now consider the following procedure
\begin{align}
\hat{\mathcal{O}}(\hat{x})|x\rangle&=\mathcal{O}(x)|x\rangle\ ,\\
\hat{\mathcal{O}}(\hat{x})T(a)|x\rangle&=\mathcal{O}(x+a)|x+a\rangle\ ,\\
T^{-1}(a)\hat{\mathcal{O}}(\hat{x})T(a)|x\rangle&=\mathcal{O}(x+a)|x\rangle\ ,\\
\Rightarrow T^{-1}(a)\hat{\mathcal{O}}(\hat{x})T(a) &= \hat{\mathcal{O}}(\hat{x}+a)\ ,\\
\Rightarrow \hat{\mathcal{O}}(\hat{x})=T(a)&\hat{\mathcal{O}}(\hat{x}+a)T^{-1}(a)\ ,\\
\Rightarrow \hat{\mathcal{O}}(\hat{x}-a)=T(a)&\hat{\mathcal{O}}(\hat{x})T^{-1}(a)\ ,
\end{align}
although one must keep in mind that the last three lines only make sense if they are acting on states.
Now, in the context of quantum mechanics this should be enough to convince you that operators should transform in this way (this is essentially how they transform in the Heisenberg picture). In QFT everything is always a bit more subtle/tricky, but the argument still holds and you can conclude that $(\star)$ is valid.