I'd like to add some details to Mark Mitchison's spot on answer. There really, truly is in principle no lower bound to the cost in energy, or work you have to do or "mass" you have to supply to send and receive information between one physical system and another. Or, more precisely, there is no cost that arises from the nature of information itself. Costs in transmission arise from the fact that Mark concisely puts:
In our universe information has to be written in some kind of "real ink" and that ink is the states of physical systems
Information in practice cannot be some kind of abstract, disembodied knowledge, or a sequence of symbols even though we often (highly helpfully) treat it as such in probability and information theory. In the example in your comments:
Physically implemented packet-switched bits are high and low voltage specifications through channels of flow on an electric circuit within an electric field; electrons have mass.
the sent information is encoded in the physical states of massive particles, electrons. But it can equally well be encoded in the states of massless things like photons.
So the reasons why information sending and receiving "costs" something are slightly indirect, and they arise from, in rough order of descending fundamental importance:
Landauer's principle, and, equivalently, the second law of thermodynamics;
Signal to noise limitations in the particular physical system whose states you choose to "write" your information in;
Dissipation in the particular physical system whose states you choose to "write" your information in;
Points 2. and 3. depend on the system you work with, so are not as fundamental as the first. Notice that I haven't mentioned things like the Heisenberg uncertainty principle or other quantum mechanical limitations. These show up in point 2. above; in principle, you can make these as small as you like by encoding your information in "bigger and bigger" reversible (lossless) classical systems so that, for example, the HUP accounts for less and less of the total noise limiting your signal transmission. Of course in practice, none of our technology so far is lossless or reversible, and so now and hitherto this means building bigger and more disspative systems, so the limits in 2. and 3. are very real. Quantum information transmission technology, which makes use of reversible state machines, may make these limits less important in the future.
Landauer's Principle:
The fundamental cost in shifting information from one system to another is that we have to make room for it at the receiver end! In other words, we have to encode it in the states of, say, electron spins at the receiver end. This sending and encoding we can do without energy loss: the loss arises when we ask what becomes of the information contained in the former electron spins before we wrote over them with this newly gotten information.
The microscopic laws of physics are reversible, so this means in principle that a closed physical system's total state at any time is related to its state at any other time, before or after, by a one-to-one mapping. So the former electron spins have to show up encoded somehow in the state of the physical system around our receiver. This means the thermodynamic entropy of these surroundings continually rises as the information arrives and writes over our receiver electron spins. Eventually, one must do work as stated by the second law of thermodynamics, to throw this entropy out of the system as the otherwise the system will simply stop functioning. A side note here is that Chemists sometimes talk about the Gibb's (or other e.g. Helmoholtz) free energy as the enthalpy less the work needed to expel the excess entropy the reaction products have relative to the reactants.
How do we show all this is true (at least by a physicist's as opposed to a mathematician's proof)? There is a way to account for the Maxwell Daemon and the Szilard Engine that makes these two thought experiments comply with the second law of thermodynamics in the long term and that is through Landauer's Principle: the idea that the merging of two computational paths or the erasing of one bit of information always costs useful work, an amount given by $k_B\,T\,\log 2$, where $k_B$ is Boltzmann's constant and $T$ the temperature of the system doing the computation.
This argument was finalised by Charles Bennet, whose excellent reference paper here is Charles Bennett, "The Thermodynamics of Computation: A Review", Int. J. Theo. Phys., 21, No. 12, 1982.
Bennett invented perfectly reversible mechanical gates ("billiard ball computers") whose state can be polled without the expenditure of energy and then used such mechanical gates to thought-experimentally study the Szilard Engine and to show that Landauer's Limit arises not from the cost of finding out a system's state (as Szilard had originally assumed) but from the need to continually "forget" former states of the engine.
Probing this idea more carefully, as also done in Bennett's paper: One can indeed conceive non-biological simple finite state machines to realise the Maxwell Daemon - this has actually been done in the laboratory and as the Daemon converts heat to work, it must record a sequence of bits describing which side of the Daemon's door (or engine's piston, for an equivalent discussion of the Szilard engine) molecules were on. So we get the same situation as our electron spin receiver above: for a finite memory machine, one needs eventually to erase the memory so that the machine can keep working, thus ultimately the need to throw thermodynamic entropy out of the Maxwell Daemon.
Signal to Noise Limitations
You might like to see my answer here to Maximum theoretical bandwidth of fibre-optics. Here the energy requirements to send information are indeed lower bounded by quantum mechanics and the statistics of counting photons. You must send a minimum number of photons to represent a symbol with acceptably low uncertainty. If you can send the knowledge of a system's state noiselssly, then you can theoretically transmit anything you like at symbol rates approaching zero bits per second. Given a description of an information sending channel that includes the realisable signal to noise ratio, the Shannon-Hartley form of the Noisy channel coding theorem (see also here) indirectly put lower bounds on the energy needed to send information in the face of noise. These remarkable theorems show that there is always a way to send information in an arbitrarily noisy environment, but our bandwidth is limited by the noise, given available energy supply to send information. Thus, although the Voyager 1 probe is still in contact with Earth, its transmission rate is exceedingly low.