Newtonian limit
As a starting point, let us take Newtonian gravitation. There, we know two things:
$$- \Phi_{,i} = \frac{d^2 x^i}{d t}$$
$$ \Delta \Phi = 4 \pi G \rho $$
We must first investigate how does $\Phi$ correspond to the metric $g_{\mu \nu}$. For that, we assume that Newtonian gravity is valid for slow particles $v/c \ll 1$ on an almost Minkowski background
$$g_{\mu \nu} = \eta_{\mu \nu} + h_{\mu \nu}$$
where $h_{\mu \nu} \ll 1$ including its first and second derivatives. We start from the geodesic equation
$$\frac{d^2 x^{\mu}}{d\tau^2} = \Gamma^\mu_{\; \nu \kappa} \frac{d x^{\nu}}{d\tau} \frac{d x^{\kappa}}{d\tau}$$
and considering only terms linear to $h_{\mu \nu}, v/c$ (and no $h_{\mu \nu} v/c$ cross-terms), we come to the conclusion that to lowest order of approximation the geodesic equation yields
$$\frac{d^2 x^i}{d t} \approx \frac{d^2 x^i}{d \tau} \approx \frac{1}{2} h_{00,i}$$
That is, we conclude that $h_{00} \sim -2 \Phi$.
Basic form of field equations
Now we proceed to the derivation of the full covariant field equations. As a starting point, we use the Laplace equation and the observation that it says "gravity (geometry)"="matter(energy)". Furthermore, the "left hand side" should somehow involve the second derivative of the metric because $ \Delta g_{00} \sim \Delta\Phi$, and the natural candidate for the "right hand side" is some function of the stress-energy tensor because $\rho \sim T_{00}$. (Only $T^{\mu \nu}$ and derived tensors include energy density.)
But covariant derivatives of the metric vanish, and coordinate derivatives are not covariant. The foundational covariant tensor based on the geometry from which any other geometric tensor is built (with help of the metric contractions), is the Riemann tensor $R^\mu_{\;\nu \kappa \sigma}$. The elegance of metric gravity shows itself in the fact that this "most basic geometric tensor" also involves second-order derivatives of the metric. Hence, we conclude that "the left hand side" must somehow be a function of the Riemann tensor.
Now let us investigate the tensor rank of the desired equation. If we don't want to do anything ugly with the stress-energy tensor which would produce more indices such as $T_{\mu \nu; \kappa \sigma}$, we are left with three possibilities: a 0-index (scalar), 1-index (vector/form) or 2-index (bilinear, 2-vector, 2-form) equation. Since we are interested to construct a theory based only on metric quantities, we have no tensor with an odd number of indices and we thus cannot form any 1-index equation.
Scalar equation
The 0-index equation leads us to the only possibility
$$R + ... = \lambda T +... \; \; \; \; (\rm S)$$
where the $...$ signify higher order quantities either in the stress-energy or curvature.
The main problem of this theory is that the degrees of freedom of the geometry are too unconstrained. The metric has ten components, and the equation involves its first and second derivatives, the whole being governed by a single equation. In this form, the equations cannot be solved from any boundary-value data nor from an initial value problem on a space-like hypersurface. The only solution is to constrain the Riemann tensor by some other condition.
One of the possible conditions is to set the traceless part of the Ricci tensor to zero (i.e. specify the components of the Ricci tensor independent of $R$): $$S_{\mu \nu} \equiv R_{\mu \nu} - \frac{1}{4}R g_{\mu \nu} =0$$
But when you substitute for $R$ in the above equation from equation $(\rm S)$, you get a unified 2-index equation (with a total of 10 independent components)
$$R_{\mu \nu} = \frac{\lambda}{4}T g_{\mu \nu}$$
Another possibility of constraining the Riemann tensor (20 independent components) is to fix the part which is independent of the Ricci tensor (10 independent components), the so-called Weyl tensor (10 independent components):
$$ C^\mu_{\;\nu \kappa \lambda} = 0 $$
This theory has some historical importance because it is equivalent to Nordström's (second) theory of gravity, the only scalar gravity fulfilling the weak equivalence principle. Manifolds in which the Weyl tensor vanishes are conformally flat which, amongst other thing, means that light-rays do not bend. (But we know they do, so Nordström gravity is incorrect.)
There are of course other ways to constrain the Riemann tensor, but if we do not want to combine about 9 random curvature invariants or introduce new natural constants (to set some of the tensors equal to a constant times the metric, or so), this is about what one can do with a scalar equation.
Nevertheless, the main problem with all the mentioned possibilities is the fact that they do not conserve the stress-energy tensor and thus violate the Einstein equivalence principle. If we were to put $T^{\mu \nu}_{\; \; \; ; \nu}=0$ along with equation $(\rm S)$ with $\lambda=-8\pi G$, we would get Einstein equations (see below).
Why do we need $T^{\mu \nu}_{\; \;\; ; \nu}=0$
The point of Einstein gravity is that locally, a free-falling observer has no idea about whether they are falling or not. This shows itself e.g. in the fact that around any point, a set of normal coordinates can be found, in which the metric looks like
$$g_{\mu \nu} = \eta_{\mu \nu} - \frac{1}{2} R_{\alpha \mu \beta \nu} x^\alpha x^\beta $$
This is how free-falling observers at $x^\mu=0$ actually sees the metric and geometry.
One other property of normal coordinates and the point of view of a free-falling observer is that Christoffel symbols vanish and thus the covariant derivative is just a gradient $A_{\mu ; \nu} = A_{\mu,\nu}$ (in most courses/constructions of relativity, this is the defining property of a covariant derivative). This means that a free-falling observer understands $T^{\mu \nu}_{\; \; \;;\nu}$ as $T^{\mu \nu}_{\; \; \;,\nu}$.
Now comes Einstein's postulate:
The outcome of any local non-gravitational experiment in a freely falling laboratory is independent of the velocity of the laboratory and its location in spacetime.
But if $T^{\mu \nu}_{\; \; \;,\nu} = f(R,R_{\mu \nu},...)$, then it is possible to detect the space-time geometry by local experiment, e.g. by observing the non-conservation of energy, momentum and angular momentum.
There is also a deeper, conceptual issue with Einstein strong equivalence: the mere existence of the term "inertial frame". If gravity is unshieldable and even a freely-falling laboratory is affected by gravity, there is no physical realisation of an "isolated" or "free" particle. Amongst other things, this would mean that even special relativity is built on physically non-existent terms and must thus be necessarily invalid!
Simply said, Einstein's equivalence postulate is not quite "an extra empirical requirement", Einstein equivalence is a consistency check of the whole body of relativity. $T^{\mu \nu}_{\; \; \;;\nu}$ is thus a necessity.
Finale
We are thus looking for a set of equations which is a 2-index equation and involves second derivatives of the metric. There is a single arbitrary or "theoretical prejudice" criterion in the presented construction and it is coming right now: we will also require that the equations involve only up to second derivatives of the metric. We can understand most of the contemporary modified gravities as theories taking advantage of this caveat by using higher-derivative field equations.
Nevertheless, we now arrive at this (second-derivative) form of the field equations
$$R_{\mu \nu} + \alpha R g_{\mu \nu} + \beta g_{\mu \nu} = \gamma T_{\mu \nu} + \delta T g_{\mu \nu}$$
By taking a trace of these equation we come to
$$(1+ 4\alpha) R + 4\beta = (\gamma+4 \delta) T$$
I.e. we can solve for $T$ and get rid of the $T$ term by substitution into the original equation. Hence, we can restrict ourselves to the form
$$R_{\mu \nu} + \alpha R g_{\mu \nu} + \beta g_{\mu \nu} = \gamma T_{\mu \nu}$$
The term $\beta g_{\mu \nu}$ is automatically divergenceless, because $g_{\mu \nu;\kappa}=0$. A slightly technical derivation based on the Bianchi identities yields
$$R_{\mu \nu ;}^{\;\;\; \, \nu} = \frac{1}{2} R_{,\mu}$$
I.e. we can render the left-hand side of the equations divergenceless by setting $\alpha=-1/2$. This enforces the divergence-lessness of $T^{\mu \nu}$.
The constant $\beta$, commonly written as $\Lambda$ is free and undetermined by the Newtonian limit, apart from the fact that $\Lambda \ll \gamma \rho$. But since it should be smaller than even the relatively small densities we encounter in our Solar system, it could as well be zero, right? It is a really ugly term which would be very natural to set to zero, but Nature thought different.
The constant $\gamma$ can be obtained by solving the linearized equations around Minkowski
$$g_{\mu \nu} = \eta_{\mu \nu} + h_{\mu \nu}$$
The procedure is not so simple; we have derived that $h_{00} \sim -2 \Phi$ but it is important to also derive in linearized theory that e.g. in spherical symmetry we have $h_{ii} \sim -2 \Phi$. This leads to $R_{00} \sim \Delta \Phi$ and $R \sim 2 \Delta \Phi$. As a result the $00$ component of Einstein equations reads
$$2 \Delta \phi \sim \gamma T_{00} \sim \gamma \rho$$
From this we are able to see that $\gamma$ has to be $8 \pi G$.
Finally, we obtain the Einstein equations with a single free parameter $\Lambda$
$$R_{\mu \nu} - \frac{1}{2} R g_{\mu \nu} + \Lambda g_{\mu \nu} = 8 \pi G T_{\mu \nu}$$
Sidenotes:
I have omitted higher curvature terms $\sim R^2,R^3,...$ on the left-hand side. It can be easily shown that these terms cannot generate a divergenceless tensor (it has mainly to do with the fact that $R_{\mu \nu}$ is the "only two-index tensor in shop" apart from the divergenceless $g_{\mu \nu}$).
I have not given an account of the derivation of Einstein equations from an action principle. The construction via an action (with the usual coupling to matter!) gives divergenceless left-hand sides automatically. Furthermore, David Lovelock showed that in dimension 4, the Einstein equations are the unique second-order field equations generated by an action. I.e., you could derive Einstein equations by requiring that they are generated by an action and that they are of second order in the derivatives of the metric.
I find the above presented approach more physical and direct. Furthermore, for the justifiability of the requirement of an action principle one would have to prove the following implication: if $T^{\mu \nu}_{\; \; \;; \nu}=0$, then the field equations can be generated by an action principle. (I am not sure if that implication is or is not proven.) Otherwise, there could be a lurking non-action-generated set of field equations satisfying all our physical criteria.