The reason red is on the inside of the secondary rainbow might be easier to understand if we think how much the incoming ray deviates relative to the direction it would have followed if there were no droplet. A primary rainbow is formed by two refractions and a reflection: the minimum angular deviation is about $\Delta_1 = 138^\circ$, and consequently the angle the outgoing ray makes with the incoming ray is $\delta_1 = 180^\circ - \Delta_1 = 42^\circ$. One additional reflection occurs for a secondary rainbow, so the minimum angular deviation is greater at about $\Delta_2 = 232^\circ$. Since $\Delta_2>180^\circ$, the outgoing ray is seen to emerge at an angle of $\delta_2 = \Delta_2 - 180^\circ=52^\circ$.
Both $\Delta_1$ and $\Delta_2$ are slightly smaller for red than other colors, but note that in $\delta_1$, $\Delta_1$ is subtracted from $180^\circ$ whereas in $\delta_2$, $180^\circ$ is subtracted from $\Delta_2$. Consequently $\delta_1$ is greater for red while $\delta_2$ is smaller. It's just a matter of geometry.
For the secondary rainbow to have the inverted order of bands, do the light rays have to exactly reflect twice and cross each other twice inside the water drop?
The secondary rainbow is formed entirely out of rays that reflect twice and cross themselves, yes.
If so, what are the necessary and sufficient condition for that to happen? It seems to rely on the relative direction of the incident light and the surface of the water drop and the position of incidence? The diagram only shows one particular position and angle of incidence, so it seems kind of accidental to me that it inflects twice and cross each other twice and got to the inverted order.
During each internal reflection, the light is partially reflected and partially transmitted. For example, in each droplet that you see the secondary rainbow emerging from, there is also some light exiting the rainbow after only one reflection, but it's going in a different direction so you don't see it. Likewise there are also higher order rainbows than the secondary, each successively fainter.
As for the effect of how the incident ray strikes the rainbow: if you were to keep the direction of the incident ray fixed and just adjust the point at which enters the rainbow, you would find that for a circle of such points, the angular deviation is minimized (you can do this experiment with a laser pointer and a cylindrical glass filled with water). The minimum deviation angle is what we have been calling $\Delta$. Below $\Delta$, there is no outgoing light. Above $\Delta$, there is outgoing light (this is why the interior of the primary rainbow is brighter than the exterior), but fainter than at at $\Delta$. Because there is a "stationary" or turnover point (a caustic ray) at $\Delta$, the outgoing intensity is maximized here, so this is where the rainbow is seen.