Suppose a friend has given you a jigsaw puzzle, and suppose further that you do not like jigsaw puzzles. To make your friend happy and show him that you have succeeded in solving the puzzle, you develop a special strategy, which consists of simply shaking the box and hoping that all the pieces will fall into place by themselves. After shaking the box for a few seconds, you stop, open the box, look at the contents and realize that the pieces have gotten mixed up. You close the box and repeat the process 1, 2, 3 times. Unfortunately, nothing can be done. Even if you repeat the process, the puzzle pieces do not fit into the picture. Imagine opening the box after shaking it 13 times (lucky number) and realizing that all the pieces been put together to form a beautiful picture. The surprise is great. The probability of this happening is extremely low for the simple reason that there is only one possible combination to form the picture. The total number of combinations that correspond to a random situation where not all the pieces or none of the pieces fit into the picture is certainly very large and therefore much more likely. While I do not recommend that you solve the puzzle in this way, not least because the time involved can easily exceed the length of a single human life or the lifetime of the universe depending on the number of puzzle pieces, we can note that there is an element of surprise associated with the probability of an event being confirmed. The lower the probability that the puzzle can be put together by shaking the box, the greater the surprise effect. When you open the box, you say, wow, unbelievable, the puzzle has come together. This means that there is a relationship between the probability of an event being confirmed and the surprise or, if you like, the information content that this news brings us. Let us take a second example. Your friend, for some strange reason that I honestly do not understand, decides to take you to a fortune teller to have your future predicted. You go to the fortune teller and the fortune teller's first prediction is that the sun will rise tomorrow. We are in the exact opposite situation than before. This will not surprise you, also because the sun normally rises every day, so the information content is zero, or practically zero. If, on the other hand, the fortune teller says that the sun will not rise and this event is confirmed, the information content is very high because the event is absolutely improbable. This probabilistic interpretation of the information content associated with a message is the basis of information theory and was developed by Shannon around 1948. The point is that this interpretation is one of the possible interpretations, and the relationship between information content and complexity is also controversial.
Shannon proposes a measure of disorder, entropy. Entropy is nothing more than an average of the information content of a message. The fundamental problem with this approach is that if an event is certain, the entropy is zero and consequently the information content is also zero. This fact is problematic if we use entropy as a measure of complexity. For if an object exists, how is it possible that the information content representing its description is zero? To the extent that an object exists, we are able to describe it. So there is a description that is more or less long and represents the object itself. From this observation, a completely different concept of complexity can be derived, although related to Shannon's entropy. Andrei Kolmogorov proposed this new idea in 1963. Kolmogorov measures the complexity of an object by the length of a bit sequence used for the shortest possible description of that object. Let us look at a simple example. At home, you probably have a washing machine and an instruction manual written in different languages. If we compare the information content of the descriptions in the different languages, we can reasonably conclude that it is identical. After all, the person who wrote the manual wants to convey the same information to English as to Italian, Spanish, Polish, German, etc. Therefore, if we assume that the information content is the same, we realize for instance that the length of the text in English is extremely short compared to that in Spanish, even though it conveys exactly the same message. After all, the user must be able to operate his washing machine. So there is not a single description of an object, but several descriptions, some less complex than others. Kolmogorov's idea is based on this observation. Kolmogorov propose to describe an object with the shortest possible binary sequence. The approach differs fundamentally from that introduced by Shannon. The measure of entropy proposed by Shannon can effectively be reduced to zero if the uncertainty is zero. According to Kolmogorov's approach, if an object exists, there must be a measure of its complexity, which is necessarely non-zero. Leaving aside the technical details and the mathematical formulas (I hope not to rub the physicists the wrong way), it is possible to link the entropy defined by Shannon with the concept of complexity introduced by Kolmogorov. The main problem with the Kolmogorov approach is that Kolmogorov complexity is technically uncomputationable.
Before Shannon developed a mathematical theory of communication based on information, the concept of entropy was introduced by Boltzmann in statistical physics. Boltzmann's work focused on heat theory, which relies heavily on probability theory and statistical mechanics. According to Boltzmann's definition, entropy is a function of the variables that characterize the equilibrium state of the system.
Entropy of a state measures its probability and this entropy increases as systems evolve from less probable to more probable states. Therefore entropy represents the degree of uncertainty about the state of a system and its measure is proportional to the logarithm of the number of states that make up the distribution.
From the point of view of information theory, if we consider an ideal gas that is isolated and in a state of complete disorder, we can conclude that the particles of the gas that make it up can be found in any of the accessible states, with an identical probability for each state. All states contribute equally to the information stored in the ideal gas. In other words, we said that the information content is necessarily as high as possible to describe the configuration of the ideal gas. The question that can now be asked is the following. Can entropy be used to measure the complexity of the ideal gas?
The answer to this question is probably no. In fact, it seems quite reasonable to measure complexity as the distance to an equilibrium state and thus to a probabilistic distribution in which all states accessible to the system are equally probable. In the case of an ideal gas, the state of equilibrium, measuring of the imbalance of the system, is zero, and therefore the complexity is also zero. This is in contrast to the approach of complexity measures based on the Shannon entropy.
For this reason, Lopez, Sanudo, Romera and Calbet have introduced an alternative definition. The information can be found here.
The conclusion is that, first of all, there are different types of entropy that come from completely different domains. The second important point is that complexity cannot be uniquely measured for the simple reason that it depends on the type of problem we are looking at and the scale of observation.