As already stated by @jacob1729, a good definition of energy is something that is conserved in time if the system (Lagrangian density$\mathcal{L}$) has no explicit time dependence ($\partial_t \mathcal{L}=0$ if $t$ is time).
The way to arrive at this quantity is to start with the Lagrangian for the electromagnetic field $\mathcal{L}=\mathcal{L}\left(t, r^i, \nabla_j \phi, \nabla_j A^i,\dot{A}^i\right)$, where $r^i$ is the position coordinates, $\phi$ is scalar potential and $A$ is vector potential (I will stick to Lorentz gauge).
we can then construct
$\mathcal{H}=\partial_{\dot{A}^i}\mathcal{L}\,\dot{A}^i -\mathcal{L}$
One can then proove that
$- \frac{d\mathcal{H}}{dt}= \partial_t \mathcal{L} + \nabla_j\left(\dot{A}^i \partial_{\nabla_j A^i}\mathcal{L} + \dot{\phi}\partial_{\nabla_j\phi}\mathcal{L}\right)$
Now integrate over some closed surface and apply Gauss' theorem:
$\int_V d^3 r \left(- \frac{d\mathcal{H}}{dt}\right) = -\frac{d}{dt}\int_V d^3 r \mathcal{H} = \int_V d^3 r \,\partial_t \mathcal{L} + \oint_{\partial V} \mathbf{\hat{n}}^j.\left(\dot{A}^i \partial_{\nabla_j A^i}\mathcal{L} + \dot{\phi}\partial_{\nabla_j\phi}\mathcal{L}\right)$
If your field vanishes outside of the volume, $V$, the second term on the right is not important. Then, provided the Lagrangian density has no explicit time dependence you see that:
$\frac{d}{dt}\int_V d^3 r \mathcal{H} = 0$
i.e. the quantity, $\int_V d^3 r \mathcal{H}$ is conserved. Finally, you should be able to prove that $\mathcal{H}$ is the energy density as you know it.
If you don't like Lagrangian formulations go from simple Maxwell's equations
Let there be some volume $V$. Using Maxwell's equations in vacuum, with no charges and currents, evaluate
$\frac{d}{dt}\int_V d^3 r \left( \frac{\epsilon_0 E^2}{2} + \frac{1 B^2}{2\mu_0}\right)=\int_V d^3 r \left( \epsilon_0\dot{\mathbf{E}}.\mathbf{E} + \mu_0^{-1} \dot{\mathbf{B}}.\mathbf{B}\right) = \mu_0^{-1}\int_V d^3 r \left( \mathbf{\nabla}\times\mathbf{B}.\mathbf{E} - \mathbf{\nabla}\times\mathbf{E}.\mathbf{B}\right)=-\mu_0^{-1}\int_V d^3 r \mathbf{\nabla}.\left(\mathbf{E}\times\mathbf{B}\right)$
Now apply Gauss theorem
$\frac{d}{dt}\int_V d^3 r \left( \frac{\epsilon_0 E^2}{2} + \frac{1 B^2}{2\mu_0}\right)=-\mu_0^{-1}\oint_{\partial V} d^2 r \:\mathbf{\hat{n}}.\left(\mathbf{E}\times\mathbf{B}\right)$
And you have a quantity that is conserved in time provided the stuff on the right-hand side vanishes. What Lagrangian formulation allows you to do is to justify why this quantity is energy