To ask about an action is to ask about a Lagrangian. And to ask about a Lagrangian, at least in particle physics, is to reach down to the basic premises of the subject. There is not anywhere deeper to go to explain why this or that Lagrangian is the right one, except by appeal to issues of simplicity and symmetry. In the present example one wants a Lagrangian which is as simple as possible while still leading to some sort of interesting behaviour, and which also is invariant with respect to translation in space and time (if one is considering an isolated particle) and, if one is adopting proper time in the action integral, then one wants a Lorentz scalar. So one considers the amazingly simple ${\cal L} = m c^2$. One discovers that to get the right momentum one needs ${\cal L} = -m c^2$. And hey-presto! there it is: conjured out of nothing but simplicity, symmetry, and covariance. The "proof" that it is right is that it leads to dynamics that are consistent with experiment. To discuss dynamics more fully one needs other terms in the Lagrangian, such as interaction terms, but even with this simple Lagrangian one can treat energy and momentum conservation and thus get insight into particle collisions.
Added remark
After a helpful comment exchange with my2cts, I realised the above is perhaps a little too brief to be really helpful. The more full statement of the Lagrangian in a manifestly covariant approach is
$$
{\cal L} = - mc(-u^\mu u_\mu)^{1/2}
$$
which previously I abbreviated to $-mc^2$ because that is indeed its value along the worldline which the particle actually follows. However, when using this in the Euler-Lagrange method you need to know its dependence on the 4-velocity for other paths, and this is why the full statement (just given) is needed. The action is then $S = \int {\cal L} d\tau$ and the
Euler-Lagrange equation is
$$
\frac{d}{d\tau} \left( \frac{\partial {\cal L}}{\partial u^a} \right) =
\frac{\partial {\cal L}}{\partial x^a},
$$
where $u^a = dx^a/d\tau$.
However, for anyone learning the subject for the first time I think there are good arguments to introduce the treatment in terms of coordinate time in the first instance. Such a treatment adopts
$$
\tilde{\cal L} = - m c^2 / \gamma
$$
and $S = \int \tilde{\cal L} dt$, leading to Euler-Lagrange equation
$$
\frac{d}{d t} \left( \frac{\partial \tilde{\cal L}}{\partial \dot{x}^a} \right) =
\frac{\partial \tilde{\cal L}}{\partial x^a},
$$
where $t$ is coordinate time in some given inertial frame,
and the dot denotes $d/dt$.