I think there may be a better forum to ask this question in and it will likely be closed, but information theory is important to many branches of physics in, so here's a quick answer.
The bandwidth of a channel is simply the number of symbols you can send through it per unit time. By symbol, I mean here a single, real number, and this meaning arises through the Shannon sampling theorem. See the Wikipedia page for this theorem, and go through the proof so you will understand exactly what I mean.
Now, just one lone noiseless real number can in theory encode as much information as you like. There are $\aleph_0$ digits in a real number! Write out the whole of Wikipedia as 0s and 1s and call it a binary fraction between 0 and unity and the whole of Wikipedia is still a finite precision, rational binary number! So you can see in theory that you can send heaps of data over channels that can send only a low number of symbols each second.
This theoretical ideal is, of course, limited by noise. It effectively "coarse grains" the real numbers. If I have noise with an amplitude of 0.1units, and can send symbols with an amplitude of up to 1 units, then I roughly have 10 amplitude levels I can encode data on. Otherwise put, I can tell apart ten levels. So I can encode $\log_2 10$ bits per symbol in this example. If my noise amplitude is 0.01 units, I can tell apart roughly 100 different levels per symbol. So I can encode $\log_2 100$ bits per symbol in this example.
I think you should now be able to see what's going on: the number of bits you can send per unit time is roughly
$$B \log_2 S/N$$
The actual Shannon-Hartley theorem is a little more complicated, but that's the idea.
Edit: For interest: 64-QAM modulation is commonly used for digital communications. This is essentially where the "symbols" are one of 64 points on a regularly spaced grid in the Argand plane representing the amplitude and phase of the signal. So this scheme has a spectral efficiency of six bits ($\log_2 64$) per hertz (i.e. symbol per second). The ultimate spectral efficiency of a typical optical fibre link is of the order of 20 to 25 bits per hertz: see my answer here.