Let's think about what you can do with zero derivatives, and one scalar field $\phi$. You can pretty much have any (analytic) function of $\phi$, which amounts to a potential $V(\phi)$. (There are also some stability constraints -- the potential should be bounded from below).
How about one derivative, $\partial_\mu \phi$? Well, if we want our theory to be Lorentz invariant, then $\mathcal{L}$ needs to be a scalar. There are no other tensors for $\partial_\mu \phi$ to contract with, so we can't do much here.
With two derivatives, we could write $Z(\phi) (\partial \phi)^2$. As pointed out in another answer, we can set $Z(\phi)$ to $\pm 1$ by a redefinition of $\phi$. What restricts the sign of the kinetic term is then stability. If you have a field with the "wrong sign" of kinetic energy -- so the kinetic energy is negative -- then your theory has developed a so-called ghost instability. This is a very bad instability, because the timescale over which the theory becomes unstable is arbitrarily small. Quantum mechanically, the vacuum is unstable to decaying into "ghost" particles and "ordinary" particles, with an infinite decay rate (provided your scalar field couples to other fields -- and at least we would expect it to couple to gravity).
You can also write $P(X)$, where $X=(\partial \phi)^2$ is the standard kinetic term. For example, you can have $P(X)=X+X^2$, or $P(X)=\sqrt{1-X^2}$ (the latter is called the DBI action). This class of theories is used in cosmology, and can describe different kinds of fluids. DBI describes the motion of a brane in a higher dimensional space.
What about higher derivatives than two? Due to a theory by Ostragradsky, if the equations of motion are higher than second order in time derivatives, then the theory develops a ghost instability and is unstable. There are a few ways out of this...
- If you are willing to treat these higher derivative terms perturbatively, in the sense of effective field theory, then you are safe provided you work at small energies where the Ostragradsky instability is not excited.
- Certain interactions known as Galileons have second order equations of motion and are Lorentz invariant, even though they have two derivatives acting on $\phi$ in the Lagrangian. The simplest non-trivial example of these terms is
\begin{equation}
\mathcal{L} = \frac{1}{\Lambda^2} (\partial \phi)^2 \square \phi
\end{equation}
where $\Lambda$ is a constant with dimensions of mass needed for dimensional reasons. However, Galileons also have some theoretical issues, for example superluminal propagation on some backgrounds.
- If you have multiple fields, you can get away with having equations of motion that are higher than second order, if the higher-order equation is secretly the derivative of another equation.
Putting this together, if you want to work with theories that are well-defined classically (without a cutoff) and stick to "mainstream" Lagrangians that avoid potential superluminality problems, the above considerations of Lorentz invariance, stability, and the constraints of working with one scalar field essentially limit you to the form you wrote down.
There are generalizations ($P(X)$ and Galileons) that are stable, but these are also somewhat more complicated to analyze, and since they are non-renormalizable they require some knowledge of effective field theory to understand from a quantum point of view, and additionally their relevance to physics is only in a narrow domain (compared to the good old ubiquitous "standard scalar field") and is speculative, so these cases are not taught in textbooks.