(1) The domain of validity of the equation $E = m c^2$ depends on exactly how you define $m$. The generally accepted modern pedagogical consensus is that $m$ should be thought of as a constant scalar, and under this definition $E = m c^2$ only holds for a particle at rest: the energy of a particle in motion is $E = \sqrt{(m c^2)^2 + (p c)^2}$, where $p$ is its momentum.
In the case of a massive particle, the momentum is given by $p = \gamma m v$, where $\gamma := 1/\sqrt{1-v^2/c^2}$. It's not the mass itself that blows up as you approach the speed of light, it's the momentum, because $\gamma$ gets huge. In the massive case $E$ simplfies to $m c^2 \sqrt{1 + \left( \frac{\gamma v}{c} \right)^2}$. With some simple algebra, you can see that the square root just simplifies down to $\gamma$! So for a massive particle in motion we just get $E = \gamma m c^2$. Again, as you approach the speed of light, the $\gamma$ factor blows up (not the $m$), so the energy gets enormous, not the mass.
In the case of a massless particle (like a photon of light), things are much simpler: the energy equation just reduces to $E = c p$. But to interpret this, you need to decide how to assign momentum to a massless particle: the "massive" expression $p = \gamma m v$ isn't much help, because $\gamma = \infty$ and $m = 0$. The notion of a massless particle doesn't really make much sense in classical mechanics anyway (what do you do with $F = ma$ if $m = 0$?), so you really need to consider quantum mechanics, where the expression $p = h / \lambda$ (where $h$ is Planck's constant and $\lambda$ the photon's wavelength) tells you that a photon's momentum is inversely proportional to its wavelength, or directly proportional to its frequency.
(2) You can definitely understand most undergraduate-level explanations of special relativity without tensors, but if you enjoy physics, I'd recommend trying to learn the basics as soon as possible. Without tensors, SR is an ugly mess of hard-to-remember formulas. Tensors reveal it to be a beautiful and actually extremely simply geometrical theory. Really it (almost) all boils down to the single statement "$\Delta x$ and $\Delta t$ are different in different frames, but the combined quantity $(\Delta x)^2 - (c \Delta t)^2$ is the same in every frame." Getting the hang of basic tensor manipulation really isn't too hard - you should give it a try as soon as you want! I find myself hopelessly confused whenever I try to do a SR problem by just applying formulas, but working with frame-independent quantities (Lorentz scalars) as much as possible makes things way easier. But I'd recommend trying to find a reference that introduces them in the context of special relativity, not general relativity or (God forbid) pure differential geometry - you'll get confused if you try to start at the most advanced, general level. Griffiths's books on E&M and particle physics have some decent discussion.