The theorem isn't called a no-manufacturing theorem. You can indeed make thousands, or millions, of copies of a known state.
The theorem is about making a device that takes an arbitrary and unknown state and then produces two of it.
So the point is that you have to copy something without knowing what it is that you are copying. And since you don't know what it is you are copying, you can't set up your device in different ways based on what you copy. So whatever the default state of your device is, you can call it $|0\rangle$ and move on with your argument.
And you don't need inner products. Straight up linearity is a short proof. It has to act a certain way on one state. And it has to act a certain way on an orthogonal state. So by linearity you know how it must work on a linear combination of those states as an input. And shucks, that's not the correct functionality for that state (because you get a superposition of two copies of the first state and two copies of the second state instead of two copies of the actual state you wanted).
And another key is you can indeed make a device that correctly copies two distinct states (or even copies any of a set of basis states) out of the many possible states, but it will fail to correctly copy their linear combinations.
I am not assuming the states are known. I am suggesting a protocol that always changes the "blank-state" to some non-orthogonal state whenever a new state get's copied, even though the states are unknown, none of the traditional proofs do hold then.
The proof goes through without a word being changed, and you continue to bring up an inner product when the proof holds in a linear space without an inner product so you clearly fail to understand a single thing about the proof.
So let's assume you have a linear space and a linear operator $\hat L$. And $\hat L$ needs to take an unknown vector $|u\rangle$ and make a copy. If it makes to copy by taking some raw material and it doesn't know what $|u\rangle$ is then it will have to take as its raw material something that can be turned into $|u\rangle$ but otherwise has to be independent of what $|u\rangle$ is. So it's a vector but it is independent of $|u\rangle.$ We can give it a name to take about it, so lets call it $|r\rangle$ for raw material. Maybe $|u\rangle=|r\rangle,$ maybe $|u\rangle\neq|r\rangle$ who knows, and frankly who cares? Since we have to pick $|r\rangle$ without knowing $|u\rangle$ and there are lots of possibilities for $|u\rangle$ it isn't possible for $|r\rangle$ to equal $|u\rangle$ for every possible $|u\rangle$ and hey, we could even hold an election or a lottery or use a random number generator to pick $|r\rangle$ you can think of the $|r\rangle$ as being random. Or maybe every device you buy in the store uses a different $|r\rangle$ but again, we don't care.
So now let's analyze what happens. Our linear operator is supposed to copy every state. So in particular it must copy $|r\rangle$ and so $$\hat L\left(|r\rangle\otimes|r\rangle\right)=|r\rangle\otimes|r\rangle.$$
But there are other states that aren't $|r\rangle$ let's call one of them $|o\rangle$ for other state. Should our device copy that correctly? If it doesn't then our device fails right there. If it does copy it then we know that $$\hat L\left(|o\rangle\otimes|r\rangle\right)=|o\rangle\otimes|o\rangle.$$
OK, so either our device failed on one of those two states or else it copies them fine. If it copies them fine we can ask how it copies the state $\frac{1}{\sqrt 2}|r\rangle+\frac{1}{\sqrt 2}|o\rangle$ and since the device *by assumption** doesn't have a setting to be told what state it has, it has to generate/have its raw material state be set the same way as it is set for the other two states. And in particular if you generate a raw material it better be one that works for $ |o\rangle$ and that works for $|r\rangle$ and since we want to talk about we give it a name. Again I'll just call it $|r\rangle$ since all we know about $|r\rangle$ is that $\hat L\left(|r\rangle\otimes|r\rangle\right)=|r\rangle\otimes|r\rangle$ and $\hat L\left(|o\rangle\otimes|r\rangle\right)=|o\rangle\otimes|o\rangle.$
So now we get from pure linearity of $\hat L$ that $$\hat L\left(\left(\frac{1}{\sqrt 2}|r\rangle+\frac{1}{\sqrt 2}|o\rangle\right)\otimes|r\rangle\right)=\frac{1}{\sqrt 2}|r\rangle\otimes|r\rangle+\frac{1}{\sqrt 2}|o\rangle\otimes|o\rangle.$$
Which is distinct from $\left(\frac{1}{\sqrt 2}|r\rangle+\frac{1}{\sqrt 2}|o\rangle\right)\otimes\left(\frac{1}{\sqrt 2}|r\rangle+\frac{1}{\sqrt 2}|o\rangle\right)$ because $|r\rangle$ and $|o\rangle$ are linearly independent.
And we never had to use the word unitary, we never used the word inner product or orthogonal. So if you even think about bringing up orthogonality or inner products or anything like that then you have personally failed to read what I just wrote.
The key is that if you correctly copy two linearly independent states from a raw material state that can be selected to correctly copy both those states without knowing which is has and then you claim you can use that setting and that raw material state to copy a distinct linear combination of those two states as well as those to states then your device can't be linear because the linear one that does all that does it wrong.
It must fail to be linear or it must fail to copy at least one of those three states or else it must be adjusted to copy those states based on which state it is copying. Since of the same setting (same $|r\rangle$) copies two of those correctly and is linear it will fail to copy the third correctly).
It isn't honest to pretend the theorem says something different. If you change the $|r\rangle$ based on what you try to copy then you aren't copying an arbitrary state. If you aren't linear then you aren't linear. And if you correctly function in two you fail on the third it us as simple as that. And we simply call it cloning when you have a device that can take any unknown input and make there be two of it using a raw material that didn't depend on the unknown thing you were cloning.
Here is a general thing about theorems. They assume the things they use in their proof and don't really mean anything other than that no matter who misreads them misreports them or misunderstands them or why. So from the proof itself you see what the assumptions are.
And your talk about orthogonality and inner products is not an honest characterization of the phenomena because it doesn't appear in the theorem or the proof.
Just imagine that you set your raw material to the state $|r\rangle$ not knowing whether you were going to be given $|0\rangle,$ or $|1\rangle$ or $\frac{1}{\sqrt 2}|0\rangle+\frac{1}{\sqrt 2}|1\rangle$ to copy. Then you either would have failed to turn $|0\rangle\otimes|r\rangle$ to $|0\rangle\otimes|0\rangle$ or else you would have failed to turn $|1\rangle\otimes|r\rangle$ to $|1\rangle\otimes|1\rangle$ or else you would have failed to turn $\left(\frac{1}{\sqrt 2}|0\rangle+\frac{1}{\sqrt 2}|1\rangle\right)\otimes|r\rangle$ into $\left(\frac{1}{\sqrt 2}|0\rangle+\frac{1}{\sqrt 2}|1\rangle\right)\otimes\left(\frac{1}{\sqrt 2}|0\rangle+\frac{1}{\sqrt 2}|1\rangle\right)$. Since either one of the first two would have failed or else the third would have failed.
So you failed when you selected the linear operator $\hat L$ or you failed when you selected the raw input state $|r\rangle.$ But you had to make both those choices without knowing if you were given $|0\rangle,$ or $|1\rangle$ or $\frac{1}{\sqrt 2}|0\rangle+\frac{1}{\sqrt 2}|1\rangle$ to copy. That's the issue. Your objections based on inner products are so specious they are actually suspect. Since inner products aren't involved.