Let us try to draw some intuition to back up the construction. For that let us start with classical mechanics, consider a one dimensional problem where we know kinetic terms have the form
$$L_{kin} = \frac{1}{2}mv^2 = \frac{1}{2}m \dot{x}^2$$
with the dot representing a derivative with respect to time. This is can be extended to all space coordinates for 3-dimensional problems directly to
$$L_{kin} = \frac{1}{2}mv^2 = \frac{1}{2}m\left( \dot{x}^2 + \dot{y}^2 + \dot{z}^2\right)$$
Now let us move on to space-time, where the way we "measure" vectors has to be changed to accommodate for the relation with time. Consider instead of a particles' trajectory in space, its trajectory in space-time. So let us use a four-vector $u(\tau) =\frac{d}{d\tau}(t,x,y,z)$ where $t,x,y,z$ are functions parameterized by $\tau$ (it may be or not be its proper time) of some world-line $\vec{r}=(t(\tau),x(\tau),y(\tau),z(\tau))$. This $u$ is the relativistic analogue of $v$ and to compute its norm one does:
$$u^2 = \left(\frac{dt}{d\tau}\right)^2 - \left(\frac{dx}{d\tau}\right)^2 - \left(\frac{dy}{d\tau}\right)^2 - \left(\frac{dz}{d\tau}\right)^2.$$
Once you agree that this is the correct way to compute norms of vectors in space-time (see any introduction to special relativity) we could add the condition that we want to describe photons, which are massless, this implies that they have no rest mass (again special relativity coming in) so $u^2 = 0$ for them. So we have the relation
$$0 = \left(\frac{dt}{d\tau}\right)^2 - \left(\frac{dx}{d\tau}\right)^2 - \left(\frac{dy}{d\tau}\right)^2 - \left(\frac{dz}{d\tau}\right)^2$$
which holds for any massless particle independently of the parameter used. And we would then need to rethink how to define kinetic energy for particle trajectories with no mass since something proportional to $u^2$ would not work immediately. However bear with me, I am just trying to motivate an interpretation for the form of the term $F_{\mu\nu} F^{\mu\nu}$.
The next conceptual jump is to forget about this image of point-particles and trajectories and replace them with fields. Its simplest version would be a scalar field, so let us replace trajectories $r(\tau) \rightarrow \phi(t,\vec{x})$, now the field has a "generalized velocity", $V^\mu$, along each of its arguments $t,x,y,z$, which we can assemble in a four-vector (can be shown):
$$[V_\mu] = \left( \frac{d\phi}{dt} , \frac{d\phi}{dx}, \frac{d\phi}{dy}, \frac{d\phi}{dz}\right)$$
and if we take the norm of this
$$V^2 = V^\mu V_\mu = \left(\frac{d\phi}{dt}\right)^2 - \left(\frac{d\phi}{dx}\right)^2 - \left(\frac{d\phi}{dy}\right)^2 - \left(\frac{d\phi}{dz}\right)^2$$
or written shortly using the summation convention
$$V^2 = \partial_\mu \phi \partial^\mu \phi$$
So far it is already looking promising, however a real-scalar field with no mass has just 1 degree of freedom (per space-time point) while light we now has at least two polarizations while still being a boson ( interesting ha? ) so the way we physicist have put pieces together is to make use of an object that respects special-relativity (Lorentz symmetries) and carries with it at least as many degree's of freedom as we need to describe light and we will worry later about cancelling extra degree's of freedom. We now do a final replacement $\phi(t,\vec{x}) \rightarrow A_\mu(t,\vec{x})$, so we now have a vector-field,
$$[A_\mu(t,\vec{x})] = (A_0(t,\vec{x}) , A_1(t,\vec{x}), A_2(t,\vec{x}), A_3(t,\vec{x}))$$
where each component itself is a function of space-time. We now have a tensor of velocities since each component has four arguments along which they can be displaced. This would correspond to some object
$$ V_{\mu\nu} = \partial_\mu A_\nu$$
and we could define our kinetic energy by contracting the indices and trying to respect the symmetries, but there is more than one way to do it and we don't even know whether $V_{\mu\nu}$ transforms like a tensor (respects Lorentz symmetries). This generally is the story of writing a field theory for electromagnetism which you are probably learning, but to got to the point, to ensure we have a Lorentz tensor that represents these generalized displacements and that it has the same degrees of freedom (d.o.f) as light (or at least from which you can recover Maxwell's equations) one has to patch up this $V_{\mu\nu}$ with the expression you know for $F_{\mu\nu}$
$$F_{\mu\nu} = V_{\mu\nu}-V_{\nu\mu}= \partial_\mu A_\nu - \partial_\nu A_\mu$$
Now we see there is just one way to contract the indices and build a Lorentz covariant quantity:
$$\begin{align}
F_{\mu\nu}F^{\mu\nu} &=(\partial_\mu A_\nu - \partial_\nu A_\mu)(\partial_\nu A_\mu - \partial_\mu A_\nu)\\
&= -2(\partial_\mu A_\nu \partial_\mu A_\nu - \partial_\mu A_\nu \partial_\nu A_\mu )
\end{align}$$
So you can see that the first term already looks a lot like the kinetic terms we came from, however alone it would describe four decoupled scalar fields, (4 d.o.f's) the extra term makes sure they are coupled as to eliminate one d.o.f and makes $A_\mu$ a proper vector-field with 3 d.o.f (which can be the case for a massive vector-field, see Proca action). So to really describe light an additional requirement has to be imposed but that is off-topic for the original question. I hope this road from classical mechanics to QED provides some of the intuition you are looking for.