Physics is the art of compressing our knowledge of the universe.
As it happens, whenever we stick two massive bodies near each other (or notice them near each other), they seem to move towards each other. Now, we could simply record the fact that every massive body (individually) is moving towards every other massive body (individually). This is a large amount of information.
If we come up with a compression strategy, say a law of gravity, what we get out of it is a description of the situation that uses far less information, yet still describes what is going on. We no longer need to describe in painstaking detail the position and attraction of every observed massive body and their tendency to accellerate towards each other: instead, we estimate their mass, and say "the law of gravitaiton applies to everything with mass".
This is a fantastic amount of compression in our description of the universe and predicting what it will do next.
Repeat this process many times, and you get modern physics, where our observations are distilled down to equations and algorithms that mean we don't have to just list all of our experiences and predictions, but rather "punt" and say "use these tricks", and the universe at least seems far simpler.
In complex situations, often the algorithms and equations don't work well (as evaluating them "fully" is hard on that scale). But with certain assumptions we can build rules that work on different scales pretty well, like the ideal gas law and adjustments to it.
We can use this to validate our small-scale techniques by seeing if we can derive the large-scale rules from the small-scale ones. If so, the large-scale ones are not extra rules, just consequences of the small-scale ones.
On the other hand, if it turns out that the large-scale rules are not consequences of the small-scale ones, that implies that the small-scale rules are in error in ways we don't understand yet. This implies that the quality of their compression is less than perfect, and there may be new rules that might let us derive the actual large-scale rules from them.
If you have an inconsistent theory, that means using it will sometimes predict things that we do not experience. This makes it a worse compression algorithm, because now you have to both talk about the algorithm and where it does not apply. Describing where it does not apply is extra bits of information, and may require its own pattern: if you have to individually describe each instance where it does not apply, this compression is barely better than just a collection of observations and predictions with no underlying theory.
So, a consistent theory gives you the ability to describe the universe (present and future) more succinctly than an inconsistent one.