Suppose there is a die manufacturer. This facility has a dice machine which is in charge of producing new dice by casting their faces in molds made out of some special material, so in a way, it has a list determining the probabilities of them landing in each of their faces after a throw.
Now, if the machine is broken, that would be equivalent to its list not being $p(X)=\{\frac{1}{6}, \frac{1}{6}, ..., \frac{1}{6}\} $ but rather another probability distribution, such as $p(X)=\{\frac{1}{2}, \frac{1}{2}, 0, ..., 0\}$. Where the order of the probabilities matches the number on the face of the die. If a technician is called in to repare it then it would imply that whatever he'll do will be the same as giving the machine the right list, call it $p'(X)$.
How can we measure the information added or subtracted by the new probability distribution the technician will hand to the machine?
We would like to prove that there should be some relationship between the difference in entropy between the die produced before and after the machine is fixed. I went with my gut and wanted to prove that the information in this change would be $$ \left| H(p(X)) - H (p'(X)) \right|= \left| \sum_{k=1}^{n} p'(x_{k})\log_{2}(p'(x_{k})) - p(x_{k})\log_{2}(p(x_{k})) \right| =\\ \left| \sum_{k=1}^{n} \log_{2} ( \frac{ p'(x_{k}) ^ { p'(x_{k}) }}{ p(x_{k}) ^ { p(x_{k}) }}) \right| $$ so that information was conserved whenever we added or subtracted entropy from the die the machine produced: less entropy, more information and same entropy, no information (for the case where $p'(X)=p(X) $). After doing some more work I ended up with this: $$H(p(X)) - H (p'(X))=\sum_{k=1}^{n} \log_{2} \left( \frac{ p'(x_{k}) ^ { p'(x_{k}) }}{ p(x_{k}) ^ { p(x_{k}) }}\right)=\\ \sum_{k=1}^{n} \log_{2} \left( \frac{ p'(x_{k}) ^ { p'(x_{k}) }}{ p(x_{k}) ^ { p'(x_{k}) }}\right) - \log_{2}\left(p(x_{k})^{p(x_{k})}\right) + p'(x_{k})\log_{2}(p(x_{k})) $$ The first term is the K.L divergence between the probability distributions, the second term is the entropy of our first list before the technician replaces it, but the third mixes up the probabilities of the first with the second into something that looks like their mismatched entropy of sorts. This sounds like it could be what we're looking for but I'm left with two questions
- What does $p'(x_{k})\log_{2}(p(x_{k}))$ represent?
- Can we use this expansion to prove that this is indeed the additional entropy or information transmitted by the new list given by the tecnician? It does seem to take into account the fact that the new information transmited increases as $|p(x_{k})-p'(x_{k})|$ grows larger for each entry so I hope that's the case.