The first thing to realize is this has more to do with how our eyes process light than it does with the physics of light. I find it helpful to use sound as a point of comparison. If you consider light of a single frequency, then there is a spectrum ranging from red at the longest visible wavelengths to violet at the shortest. This is analogous to a lower note going up to a higher note. So far, no circle, just a line segment.
But you can mix light of two different frequencies. And this is where the difference between vision and hearing comes into play. If we hear two different frequencies of sound, we actually perceive it as two different pitches playing at the same time. The reason is that the ear has a huge number of sensors (little fibers in your inner ear) tuned to respond to different frequencies. If you play a middle A and a middle C on the piano at the same time, the hair cells tuned to the lower note and higher note will both respond strongly, but there will be a whole range of hair cells tuned to intermediate frequencies that respond more weakly. So the brain knows that it can’t possibly be a single pure frequency because there’s no way a single frequency could excite both the “middle A hair cells” and the “middle C hair cells” strongly without exciting the “middle B hair cells” even more strongly.
But the eye has a different mission. It puts a lot more work in telling where light is coming from than what the exact combination of frequencies is. So, rather than having sensors finely tuned to a huge range of frequencies, it has sensors (the cone cells) tuned to just three different frequencies. They are called S, M, and L cells, for Short, Medium, and Long wavelength, and the L cells actually peak in the yellow part of the spectrum. But I'm going to call them Blue, Green, and Red since those are the colors we use when we want to stimulate them. See this image from Wikipedia:

Because of the small number of kinds of sensors, you can't distinguish a mixture of blue and green from a pure cyan beam of light, and you can't tell certain different mixtures of green and red from orange and yellow. So we just see a linear spectrum even if sometimes we are seeing "chords" (to use the sound analogy) of mixed light.
But purple/magenta are different. If you mix red and blue without green, you can tell it's not a pure wavelength somewhere between blue and red, because the green intensity is missing. So mixtures of blue and red connect the two ends of the spectrum, making a circle.
Now there's a wrinkle to this, because violet, which is the shortest visible wavelength at all, looks like purple or magenta. I try to give my best understanding, as an amateur, for why that is on Quora (and some of the above text is repeated from that answer). But that doesn't really matter for this question. As long as you have just three color sensors sampling different parts of the spectrum, then you are going to get a triangle of combinations where two sensors have different weights and the other is at zero. When you have all three nonzero, the perception tends to be of a more pastel version (or a muddier version if the total intensity is low) of one of the colors on the boundary of the triangle, rather than something completely different.