You seem to be comfortable with the idea of "work", so think of the elementary statement "energy is a system's capacity for work", which is a statement that I must say I never thought about deeply until I came across your question - energy and work have always been interchangeable for me. But let's put this "energy is a system's capacity for work" into a more everyday analogy: "energy" is to "work" as "money" is to "goods and services". The point being that for several hundred years we have known that a system has a kind of "budget" to allocated to the "work" it does. It doesn't matter whether you use this capacity up all at once or in little bits at different times, nor does it matter how you do this work - the overall tally of the amount of work you can do with a system is always the same and given by some "budget" that is mysterious to you. Furthermore,you can't spend more than this budget until you top your system up in some way. Sometimes the accounting has to be done very carefully (e.g. to account for non ideal effects) but this "budgethood" property is overwhelmingly experimentally backed up.
At this point it would be good if someone else could put some dates on these ideas: clearly James Joule was hugely influential, but my history is not that strong, weylaway and alas!
Now a modern (from about 1930 onwards) insight into conserved quantities is that they arise from symmetries of the laws of nature and conservation of energy arises because a system's describing physical laws are invariant with respect to translation in time. In more everyday words: it doesn't matter whether I measure time beginning at 4 o'clock or whether I wait till after my coffee break and put my time "origin" at half four. Or indeed at any other time. Nature does not care about my whims as to where an origin should go - the laws of nature have to have exactly the same form whatever the origin! These ideas are formalized in Noether's Theorem which is that if the description of a physical system can be cast into a certain framework (Lagrangian dynamical form and I think little if any physics can't be written in this framework) and if the description is invariant to a continuous symmetry (e.g. sliding one's time origin through time as I've just spoken about), then there is a generalized idea of a "current" (like flowing water) and it fulfills a continuity equation that ensures there is a conserved quantity. Sliding our spatial $(x,y,z)$ origins to different points is another vector continuous symmetry and there are accordingly three separate conserved quantities according to Noether's theorem because no physics cares where our spatial origin is either. These three conserved quantities are the three components of the momentum vector, and putting them all together we can say that the invariance of physical laws with respect to our spacetime origin sliding around begets, through Noether's theorem, the conservation laws for the momentum - energy four vector.
Of course there is nothing that mathematically proves we have to equate any particular Noether-theorem-begotten conserved quantity to any experimentally observed one, so "we" (physicists in general) act on our hunches. For given, specialized descriptions of particular systems one can show that the conserved quantity is indeed the total system energy, but it is an experimental fact that the conservation applies generally. This mysterious "budgethood" property first probed by James Joule seems an excellent hunch candidate for the conserved quantity arising from the invariance of our physics description to shifts in our time origin. So we assume this hunch and see what shakes! - i.e. we do the experiments and see how the outcomes compare with what we would foretell from our hunch. It turns out that our hunch is right, to extremely high experimental precision.
So a system's energy is a system's capacity for doing work, there is a really beautiful theoretical grounding through Noether's theorem to this mysterious capacity that you contemplate and this grounding has been fantastically well proven experimentally.
Pretty neat, eh?
One should wrap things up by saying that there is no concept of global energy conservation (i.e. on cosmological scales) owing to the "curved" spacetime of general relativity. Descriptions of physics clearly have to be invariant when our co-ordinate and time origins shift in the flat and homogeneous spacetime of special relativity and physics up till that theory. But now spacetime's geometry has a "topography": spacetime hills and dales begotten of varying mass and energy (or should I say "stuff" - see below) distributions in the universe and these hills and dales mean that system descriptions DO change when our co-ordinate origins shift: the origin shifts relative to the hill and dale "landmarks" in curved spacetime. The continuous symmetry formerly implying energy conservation through Noether's theorem is broken. But on local scales, where the spacetime manifold looks like the flat tangent space, homogeneity still applies so conservation of energy is still valid over non-cosmological spacetime spans.
As for the $E = m c^2$ thing: if you confine some "stuff" of the universe with energy content $E$ in a box, it will have an inertia $m = E/c^2$ (i.e. the box will accelerate at rate $F c^2 / E$ when subject to a force $F$) and will interact gravitationally with other "stuff" as though its gravitational mass were $m = E/c^2$. I use the word "stuff" because everything in modern physics (all particles - whether bosons or fermions) is on the same footing as far as relativity is concerned and the amount of "stuff" is measured by its energy content $E$. Gravitationally and inertially, this "stuff" has a mass property given by $m$.