It took a lot of struggle to get an idea for what "hilbert space" and tensor products really mean in quantum mechanics.
Personally, I think explanations that rely on "tensor product formalism" is the absolute worst for understanding. It gives zero intuition and turns the learner into a plug-and-chug monkey.
The concept that made things click to me is a careful writing of what the wavefuction repesents:
The wavefunction represents a (complex) value that (when squared) describes the probability of a POSSIBLE OUTPUT STATE.
It sounds obvious but this is the key idea: potential output states are assigned probabilities.
Now it's very natural to see what happens in many particle states:
If I have a "quantum coin" described by the state $\Psi = (\sqrt{P_H}|H\rangle+\sqrt{P_T}|T\rangle)$.
And I flip two of these coins, what's the output state?
Now it's clear in normal probability that the output possiblities for flipping two coins are:
HH, HT, TH, and TT (4 output states)
And this doubles for every coin we add! Adding one more:
HHH, HHT, HTH, HTT, TTT, THT, THH, TTH (8 output states)
Now in the quantum case each of these possiblities needs to be assigned its own probability amplitude (and has the potential to cause interference!)
Now if we flip two quantum coins completely independently, we identify that there shouldn't be any correlation between the coins, and it should look exactly the same as the classical case.
$$P(HH) = P_H P_H\\
P(HT) = P_H P_T\\
P(TH) = P_T P_H\\
P(TT) = P_T P_T$$
Now is there a linear operator that will take two states $\Psi_1 = (\sqrt{P_H}|H\rangle_1+\sqrt{P_T}|T\rangle_1)$ and $\Psi_2 = (\sqrt{P_H}|H\rangle_2+\sqrt{P_T}|T\rangle_2)$ and turn them into the correct combined possibility space that gives independent probablities? That's the tensor product!!
A tensor product is used to describe states that are independent. And this is exactly why entanglement is if and only if a given state CANNOT be described by such a tensor product.
So giving an example, if you have some set of output possibilities like:
$c_1|H\rangle_1|H\rangle_2 + c_2|H\rangle_1|T\rangle_2 + c_3|T\rangle_1|H\rangle_2+ c_4|T\rangle_1|T\rangle_2$
If you can't simplify it so that it can be of the form $(a|H\rangle_1 + b|T\rangle_2) \otimes (c|H\rangle_3 + d|T\rangle_4)$
Then you know your state isn't "independent" (and is by definition entangled).
I think a lot of times this entanglement example is thought of as separate from what these tensor products represent, and I think that this is a mistake - I wasn't able to make any sense of this stuff until eventually finding this line of thinking.
One final note: often people say something like:
For a state 1 (existing in $\mathcal{H}_1$) and a state 2 (existing in $\mathcal{H}_2$) entanglement exists in the space $\mathcal{H}_1 \otimes \mathcal{H}_2$. This language is very confusing, but is unfortunately common and is rarely explained. The "hilbert space" $\mathcal{H}_1 \otimes \mathcal{H}_2$ simply represents the set of probability amplitudes that could be assigned to the combination output state. In our example, with 2 quantum coins, we would have a space of $\mathcal{H}_1\otimes \mathcal{H}_2 \rightarrow (|H\rangle_1 + |T\rangle_1) \otimes (|H\rangle_2+ |T\rangle_2)\rightarrow (|H\rangle_1|H\rangle_2 +|H\rangle_1|T\rangle_2+|T\rangle_1|H\rangle_2+|T\rangle_1|T\rangle_2 )$
In this case, we are using this notation as more of a "trick" to mix our kets together so that we get the space that describes the larger possibility space!