A short addendum to Emilio Pisanty's last paragraph.
The information storable in one continuous variable (i.e. one real number, which in principle encodes $\aleph_0$ bits) is precisely quantified by Shannon's noisy channel coding theorem.
Let's suppose we have a normalized real variable $x \in [0,1]$: the interval represents the fact that we have a finite voltage range, or light intensity range, or whatever we use to record our information. We think of "writing down" our information from a source with Shannon entropy $H$ bits per symbol as a value $x\in[0,1]$. When we come to read this value, it has in general been corrupted by noise, so its value will be some other $y\in[0,1]$ and we can think of the writing / reading cycle of the same variable as a transmission through a noisy channel. Intuitively it makes sense to use only discrete values in the interval to stand for recorded information: the more tightly packed they are, the likelier they are to be corrupted in the read/write cycle, so we can see that there is going to be some limit here. So we have two discrete probability distributions $p_X(x_j)$ the distribution of which symbol is written in the real variable, and $p_Y(y_j)$ the distribution of which symbol is read instead.
The noisy channel coding theorem states that the maximum storage capacity $C$ in bits of this real variable is the supremum over all possible symbol $p_X(x_j)$ distributions of the Mutual Information of $p_X(x_j)$ and $p_Y(y_j)$ i.e.
$$C = \sup\limits_{p_X} \left(\sum\limits_{x_j}\sum\limits_{y_k} p_{X,Y}(x_j,y_k) \log_2\frac{p_{X,Y}(x_j,y_k)}{p_X(x_j)\,p_Y(y_k)}\right)$$
where $p_{X,Y}(x_j,y_k)$ is the joint distribution of the input $x$ and output $y$ and models the noise corruption of the written variable.
If the written variable is corrupted by Gaussian noise of variance $\sigma^2$, then we intuitively expect that the number of levels in $[0,1]$ that we can tell apart will be of the order of $\sigma^{-1}$ so we expect roughly $-\log_2 \sigma$ bits will be storable in the continuous interval. Indeed, if we apply the noisy channel coding theorem above to this situation, we find the Shannon-Hartley theorem, which is the noisy coding theorem for an additive Gaussian noise channel:
$$C = \frac{1}{2}\log_2(1 + \mathrm{SNR}) = \frac{1}{2}\log_2\left(1 + \frac{1}{\sigma^2}\right)$$
bits per symbol, which approaches our intuitive expression $-\log_2 \sigma$ as $\mathrm{SNR} = \sigma^{-2}\to\infty$. $\mathrm{SNR}$ is the signal to noise ratio.
It is important to take heed of the remarkable fact that $C$ represents a situation arbitrarily near to perfect, noiseless information storage and is not a "rough measure of storable bits". That is, the noisy channel coding theorem takes exact account of the possibility of error correcting coding spread over many such information storage variables. It assumes we have a large number of these unit intervals and that we spread our coded information over this large number and deliberately introduce correlations between them through codeword structure so as to detect and correct errors. If we are allowed to do this over an arbitrarily large number of these unit intervals, then the theorem shows us that we can noiselessly encode $C$ bits per continuous variable, with the probability of any errors (after error correction) approaching nought as the number of coded variables gathered into each codeword increases without bound.
This is why the theorem is so ingenious: without constructing the code, it can show that there exists one that will come arbitrarily near to achieving perfect storage, as long as we demand only up to and including $C$ bits per symbol. It also shows that if we try to store $C+\epsilon$ bits per symbol, for any $\epsilon>0$, then the probability of errors approaches unity as the number of read/write cycles approaches infinity, whichever coding scheme we may use. $C$ truly does represent the exact capacity of a noisy continuous variable.