Formally, you treat the operator as if it were a variable.
It works because functions of operators are usually defined by their Taylor series. If we plug that into the left side, the problem reduces to calculating $[A^n, B]$ (since the commutator is a linear operation). By repeatedly using $[AB,C]=A[B,C]+[A,C]B$ we find
$$[A^n,B] = \sum_{k=0}^{n-1}A^k[A,B]A^{n-k-1}.$$
Now if $[A,[A,B]]=0$, as you write, we can switch all $A$ to one side and get
$$[A^n,B] = \sum_{k=0}^{n-1}[A,B]A^{n-1} =[A,B]\,n A^{n-1}$$
which on the right is exactly the usual rule for differentiation with respect to $A$, if $A$ was a variable. So for the purpose of calculating the original commutator, you can just treat it as if it was a variable and derive with respect to it. Careful though whenever $f$ is a function of multiple non-commuting operators.