Here is, at long last, my own attempt to thwack at this problem.
To understand it, we first need to, as with many things, take a bit of a step back. We should not so much at first be interested in the form of the Lagrangian inasmuch as its goal, which is this:
The Lagrangian lets us describe motion as an optimization process.
Now why might we want to do that? The answer is simple: Many things in the world involve some form of optimization process, across many, many domains. A familiar one from physics you quite likely have encountered directly in your life is that of a soap film. It takes the form it does because it seeks the minimum energy, or most even-handed distribution of forces. If they were off-balance, such would pull it into a different shape until that balance was had. The shapes of planets are another example: this is why they are all (near)spheres: that shape, again, optimizes the potential energy. In human civilization, we also try to seek the optimum in many things: e.g. we tend to want to find the least expensive solution to any problem - for however much better or for worse that that is. This, likely, is also rooted ultimately in biological optimization of our psychology or better psycho-cultural blend, through the process of evolution, optimizing for reproductive success.
These examples should convince you that looking at things in terms of seeking optima is a very reasonable thing to want to do, and thus we might ask ourselves as to whether or not motion could or should be treatable as such in some sense: is there some sense in which we can say that objects move on the paths they do because they are, in this sense, "optimal"? It may be that there isn't, or that it doesn't tell us anything illuminating, but on the other hand, it may be that there is.
And the Lagrangian is the answer to that, although it's not a 100% satisfactory one because the path that is taken need not strictly speaking be the truly least action path. We integrate it, it gives us a kind of "cost", so to speak, which is then (partially) optimized and that gives us the "right" path of motion that an object "really" takes. Of course, that still leaves open the question of why it happens to take the rather strange form
$$L(\mathbf{q}, \dot{\mathbf{q}}, t) := K(\dot{\mathbf{q}}) - U(\mathbf{q}, t)$$
so that the action is
$$S[\gamma] := \int_{t_i}^{t_f} L(\gamma(t), \dot{\gamma}(t), t)\ dt$$
over a parametric potential path of motion $\gamma$, beyond just "well, it reproduces the motions we see". While it would - and I'll get back to this at the end - not be a surprise if that, given how that this is a general framework, for rather complex and obscure phenomena, there might not be a straightforward interpretation, one's first exposure to this concept is still in the context of basic Newtonian mechanics, and thus there should be at least some way to make it intuitive to give a base on which to build those more complex concepts and applications.
And this is the best answer I could come up with. Since we are looking at motion as an optimization of path taken in terms of a "cost", we will seek ideally its minimization, given that generally speaking by intuition we tend to think in terms of savings, not in terms of spending, when it comes to "improvement" of doing something. To that end, let us consider something else that, hopefully, many people should be familiar with on at least some level: namely the money cost required to transport a parcel of goods from one point to another on the Earth's surface. In particular, we can imagine hauling the goods on a land vehicle like truck or train, and when we do so, we generally find that at least four factors influence the cost, as such:
- The mass of the freight: heavier loads are more expensive to transport and typically transport is often billed on a per-mass basis,
- The distance over which it is hauled: the farther we need to move it, the more expensive (e.g. more fuel, more risk, more pay to the driver, etc.),
- The time to haul: if we require it be delivered fast, expect to pay a premium (better vehicles, need to choose right shipping routes, driver has to stay up longer maybe...),
- The conditions of hauling: if the haul requires negotiating inclement weather, challenging terrain, or other such factors, expect, again, added cost at least somewhere along the line.
And rather very interestingly it turns out that, at least in simple cases, we can, surprisingly, interpret the "weird" Lagrangian in virtually almost the very same way:
The action is the "price" that Nature will pay to haul its goods.
To do that, we first have to narrow our attention a bit and then rewrite it in a form that a physicist, at least trained on the standard "model", might find peculiar, but which is mathematically 100% kosher. The narrowing of our attention is basically to the case of a single particle, and we use the usual coordinates for the motion. In this setup, the action principle takes the form, which you can verify, from the (opaque) "kinetic minus potential" business,
$$S[\gamma] = \int_{t_i}^{t_f} \left(\frac{1}{2} m[\dot{\gamma}(t)]^2\right) - U(\gamma(t))\ dt$$
which we can then rewrite as
$$S[\gamma] = \int_{t_i}^{t_f} \left(\frac{1}{2} m[\dot{\gamma}(t)]^2\right)\ dt + \int_{t_i}^{t_f} [-U(\gamma(t))]\ dt$$
Now, note the following trick: $\dot{\gamma} = \frac{d\gamma}{dt}$ is just the velocity, $\mathbf{v}(t)$, a vector, since it's the time derivative of the path in ordinary coordinates, by setup. Its square is thus the square of the speed: $[v(t)]^2$, where speed $v$ equals $\frac{ds}{dt}$, the rate at which arc length ($s$) is covered as time is elapsed. Using that allows us to transform the first integral to:
$$\int_{t_i}^{t_f} \left(\frac{1}{2} m[\dot{\gamma}(t)]^2\right)\ dt = \frac{1}{2} m \left(\int_{t_i}^{t_f} \frac{ds}{dt} \left[\frac{ds}{dt}\ dt\right]\right)$$
which by "unashamed mashing of differentials" (i.e. the chain rule and change of variable) becomes
$$\frac{1}{2} m \left(\int_{0}^{d_\mathrm{tot}} v(s)\ ds\right)$$
(Note if $s$ is a function of $t$, $ds$ becomes $s'\ dt$, and $v(s(t))$ is just $v(t)$, which is just what we had before.)
where $d_\mathrm{tot} = d_\mathrm{trav}$ is the total distance covered over the complete motion and we have switched to measuring the progress of the motion in terms of the distance covered so far. Even more suggestively, noting the usual definition of averages from calculus, we can thus rewrite the above kinetic term, and hence the whole action, via the average speed,
$$S[\gamma] = \frac{1}{2} mv_\mathrm{avg}d_\mathrm{trav} + \int_{t_i}^{t_f} [-U(\gamma(t), t)]\ dt$$
Moreover, we can do the same for the potential term on the right using the average potential encountered and the journeying time $t_\mathrm{journ} := t_f - t_i$:
$$S[\gamma] = \frac{1}{2} [mv_\mathrm{avg}d_\mathrm{trav}] + [-U_\mathrm{avg} t_\mathrm{journ}]$$
And we see that this expression, then, very, very well coheres with the intuitions we just discussed about transport cost: the first term is basically the cost of movement, that is, the cost inherent to moving a certain distance at a certain speed. The cost increases in proportion to mass transported, the speed of transport, and the distance: exactly as we might think (though in our human world the relation is seldom so simple as an exact proportionality like this - but such is the elegance of basic principles of the Universe). Moreover, the meaning of the potential term, and that all-vexing minus sign, comes into play: this term addresses the fourth factor (hence why I chose it, because I worked this out ahead of writing this post), which is the environment, or perhaps, "terrain cost". Remember that, unless our particle is in deep, intergalactic space, free from virtually all other influences, it is going to be subject to the actions of forces which will be competing to influence its motion. The potential term basically says "stay for as little time as possible at as shallow a depth as possible in any attractive wells", or in terms of cost, that you will be "billed" more for staying longer and deeper. The reason for the negative sign is just that: as a potential well goes deeper, its potential decreases. To make that deeper depth cost more, we must flip the sign on the potential, so it is negative. Perhaps not quite how we'd set up the cost, but it should be understandable and sensible in its own way.
Insofar as why (intuitively) a stationary point and not always a minimum? Well, not always can we get the cheapest we might want in every situation, but we can get something that's kinda cheap - at the very least, if something goes a bit wrong and we have to take a little detour, it shouldn't hurt the cost too much.
Of course, as I said, this is all for a relatively simple case. And yes, as you get to more complicated cases and more difficult physical phenomena, it becomes less clear how the action relates to how we would account for cost, but that's not a surprise: we're not, now, dealing with simple point-to-point movement. The purpose of this exercise is to first get you the base intuition that action is a cost of movement - and this is something that might be familiar to anyone who has played certain role-playing games: very often this notion of "action cost" finds its way there. Naturally, to describe other, more complicated, phenomena, we have to define the action differently, describing them in terms of other costs than these. This is no different from working in terms of forces, where the intuition is presumably somewhat clearer. For example, in electromagnetism (though I might not have gotten this part quite right), we can similarly describe the action as having to blend and account in an appropriate way the costs of building up and/or tearing down an electromagnetic field, the rate of such construction and/or demolition, and the maintenance cost of holding a nonzero field.
And that, finally, answers your question of, essentially, how do you "think" in Lagrangian mechanics so as to get the right result and not be led astray: yes, it is a different framework, and unfortunately, like in many cases of dealing with novel frameworks, it seems the dominant approach to it is to essentially use it "by conversion" (similar problem as in teaching Americans the metric system, or teaching a foreign language by how to translate or relate words/phrases to your native language, or any one of a number of other instances of this common failing that are just not coming to the top of my head right now) instead of "on its own terms". To really use it proficiently, you essentially have to give the notion of "forces" a heave-ho and think afresh, from the get-go, in terms of actions instead: what actions happen here and what may impede or facilitate them (e.g. dipping into a potential well), and what are the right formulae by which to describe the costs those actions have. Thus, had we gone this route from the beginning, e.g. perhaps had Newton had the thinking of a economist or service provider seeking to improve productivity, he might have started by postulating that linear steady motion, i.e. no potential wells, would have a linear cost formula. (And he would then have had to develop the calculus of variations first, whereas we treat the instantaneous calculus as primary instead ... I wonder how that would have influenced the development of mathematics ... so many roads not taken, all throughout history, where could they go? What possibilities lurk therein? I wonder ...)
In conclusion, we see that, as Maupertuis said (who formulated a somewhat weaker action principle before Hamilton's Principle), indeed,
Nature is thrifty in all its actions.
but with our more sophisticated understanding, we should perhaps gently persuade him of:
Nature prefereth thrift in action, except when need shouldst compel that it spend more, in which case, next best shall do.
(One more thing: You might ask why there's a $\frac{1}{2}$ term, from this point of view, in the kinetic term [motional action cost]. One way to say it might be that Nature likes to make terrain twice as important as movement, but you should note that with a suitable choice of units, the term can be made to disappear: we could measure in half mass-units, or in double energy-units, noting though this may seem to break the coherence of how these units are usually defined. However, we could also, then, perhaps that that is "on us" in that we derive our energy units from force considerations as primary: remember that a "joule" is "one Newton of force for one metre of distance". Indeed, were we to take Lagrangian as primary, which may make more sense given its more fundamental role, we may seek that the energy unit is "the energy that 'gets stuff done' at a rate of one action unit per unit of time", and define mass however we need as a separate quantity. Indeed, with this choice made, we finally arrive at the "so hard" formulation, for Newtonian mechanics in the case of conservative forces:
$$
\mbox{"Action Cost"}\\
is\\
\mbox{Mass to Move}\ \ times\ \ \mbox{Distance of Transport}\ \ times\ \ \mbox{Speed Required}\\
plus\\
\mbox{Obstacle Negotiation Cost}\ \ times\ \ \mbox{Obstacle Negotiating Time}
$$
Almost as easy as $\mathbf{F} = m\mathbf{a}$!)
Has you said, is just required to be stationary. It is there in Wikipedia. However, after check for a few examples in nature, you realize that it tends to select the minimum effort or easy way. Ask yourself: “If any one proposed me two legal and moral ways to get rich just differing for one being harder than other. Would I choose the hard way?” Plants could grow oblique maximizing the stress over their base. Instead, they are just prone to grow vertically minimizing that.
– J. Manuel Jul 06 '17 at 09:24