79

In music, when two or more pitches are played together at the same time, they form a chord. If each pitch has a corresponding wave frequency (a pure, or fundamental, tone), the pitches played together make a superposition waveform, which is obtained by simple addition. This wave is no longer a pure sinusoidal wave.

For example, when you play a low note and a high note on a piano, the resulting sound has a wave that is the mathematical sum of the waves of each note. The same is true for light: when you shine a 500nm wavelength (green light) and a 700nm wavelength (red light) at the same spot on a white surface, the reflection will be a superposition waveform that is the sum of green and red.

My question is about our perception of these combinations. When we hear a chord on a piano, we’re able to discern the pitches that comprise that chord. We’re able to “pick out” that there are two (or three, etc) notes in the chord, and some of us who are musically inclined are even able to sing back each note, and even name it. It could be said that we’re able to decompose a Fourier Series of sound.

But it seems we cannot do this with light. When you shine green and red light together, the reflection appears to be yellow, a “pure hue” of 600nm, rather than an overlay of red and green. We can’t “pick out” the individual colors that were combined. Why is this?

Why can’t we see two hues of light in the same way we’re able to hear two pitches of sound? Is this a characteristic of human psychology? Animal physiology? Or is this due to a fundamental characteristic of electromagnetism?

Qmechanic
  • 201,751
chharvey
  • 868
  • Closely related questions here and here. – knzhou Dec 01 '18 at 11:20
  • 1
    A short answer would be: our eyes perceive so much more information per second. HEaring sounds is sporadic, you can afford to interpret them well, since that's useful in order to know what it is coming. However, decomposing pixels on every 24fps would need so many resources that it just doesn't worth it, you won't get really useful information for that either. – FGSUZ Dec 02 '18 at 01:08
  • 4
    2 beams of different color lights do not superimpose into a single wave form the way sound does. One is a electromagnetic wave the other one is just a pressure traveling through air. – MadHatter Dec 02 '18 at 02:15
  • 5
    Mammals were typically nocturnal in the time of the dinosaurs, that's why they sunburn easily and have whiskers. Only primates have RGB eyesight, dolphins only see green, and most mammals don't see red. Eyes have 3 wavelength sense photoreceptors, ears have have thousands of continuous wavelength sense nerves in a cone-tapered spiral tube. Photons do not merge BTW, sound pressure does. – bandybabboon Dec 02 '18 at 04:23
  • 4
    @MadHatter — EM waves are famously known to superimpose, causing constructive/destructive interference, as demonstrated in the double-slit experiment – chharvey Dec 02 '18 at 04:32
  • 1
    True, but they do not change the frequency to make a new energy. I.e. we may see red plus blue as yellow, but not because the energy of the wave changed to have it's frequency. – MadHatter Dec 02 '18 at 04:42
  • Let's not forget the needs imposed by our sensory organs by the forces of evolution. It is a survival skill to have a very fine directional resolution for at least one of the senses. Vision became the one because low wavelength helps there. There's no room in the eye for sensors to do full Fourier analysis at the desired high resolution. Ear, OTOH has problems resolving directional data (need stereo to get any idea), but the available space of sensors can usefully (at least for communication) be equipped to decompose the frequencies. – Jyrki Lahtonen Dec 02 '18 at 14:09
  • 4
    The ear contains a harp, with many strings, each sensitive to a particular frequency. The eye contains three types of receptors -- red, green, and blue. Any colors other than those are "guessed at" by judging the relative intensities of the three colors. – Hot Licks Dec 03 '18 at 01:38
  • There's a nice chapter in Vol. 1 of the Feynman Lectures on Mechanics of Seeing. He also touches on the perception of sound at the end of the chapter on Harmonics in a section called non-linear responses. I was just re-reading some of these sections to find some nice tidbit to share here, but as usual his explanation is quite a complete journey. Just jump in. – carlof Dec 02 '18 at 00:57
  • 1
    FGSUZ, that answer is distinctly incorrect. The auditory system is faster than the visual system by several orders of magnitude. There are physiological facts to point to, but it will be sufficient to say that video sampling rates are roughly 24 to 60 per second, while audio sampling rates are typically 44100 per second. – Matt74 Dec 03 '18 at 16:30
  • @Matt74 one could argue that the eye is massively parallel (each rod/cone is a separate sensor - and the eye as a whole doesn't have a "frame rate") whereas the auditory system is a pair of single, individually faster sensors with a high range. If you want to compare data - look at the relative sizes of video and audio files of the same length. The comparison isn't particularly simple. – Baldrickk Dec 03 '18 at 16:50
  • 1
    Nitpick: Actually, the green + red light will yield a yellow that's just a tiny little bit less saturated than the pure yellow light. While we can match each hue with RGB color composition, we can only match the full saturation of the three basic colors we use (assuming those basic colors are emitted by lasers). All other hues will appear slightly grayed out due to being a mixture of frequencies. – cmaster - reinstate monica Dec 03 '18 at 22:59
  • 1
    @Matt74 No, that's not quite correct. We don't "sample" rapid pressure changes... we perceive frequencies and their relationships with each other. It turns out that we can only do this around 40-60 times per second as well... just like with visual changes. – Brad Dec 03 '18 at 23:19
  • 1
    @chharvey One thing to consider is that with sound, much of what we're hearing are harmonics that are octaves outside. We don't even have a full octave of range with visible light. Physiological differences aside, I suspect (but do not actually know) that this is one of the reasons it's different. – Brad Dec 03 '18 at 23:21
  • Brent Weeks's pondering of the selfsame question spawned the wonderful Lightbringer novels. Highly recommended. – Apollys supports Monica Dec 03 '18 at 23:30
  • The statement "When we hear a chord on a piano, we’re able to discern the pitches that comprise that chord." is incomplete. When a chord is struck on a piano, each note in the chord starts at a different time, and those time differences tell the listener that it is a chord of multiple notes, rather than a single complex tone. (This is easy to demonstrate using an electronic piano and its volume control.) – John R. Strohm Dec 04 '18 at 00:35
  • @Baldrickk - yeah you're right - the comparison is not simple. I don't think that file size is a good basis for comparison in this case though. For electronic media, we need to represent the pixels in parallel. If we had to do that in sound, files would certainly have to be larger. But the auditory system does it for us, so there's no need. And it's also fair to say that the auditory system runs in parallel as well, since there are roughly 3500+ inner hair cells each signaling pseudo-independently to corresponding spiral ganglion cells. – Matt74 Dec 04 '18 at 17:06
  • @Brad - the auditory nerve can faithfully follow acoustic fine structure up to roughly 4000 to 5000 times per second (Rose et al. 1968, Hearing mechanisms in Vertebrates), which is a lot more than 40-60 times per second. But maybe you're thinking specifically about perceiving the change from one frequency to another? or representation in the cortex? Not sure about the speed limit of that. – Matt74 Dec 04 '18 at 17:12
  • @Matt74 Yes, exactly! An example of this in practice... compressed audio like MP3 works in the frequency domain and has a minimum frame size of around 2 milliseconds. Musicians and careful listeners can notice a difference between this and uncompressed audio, especially around the "smearing" of higher frequency transients, but for most folks the level of accuracy is fine as long as the frequency components are reproduced accurately. – Brad Dec 04 '18 at 18:39
  • I'm voting to close this question as off-topic because it's about the physiological responses to stimuli and not physics. – Kyle Kanos Dec 17 '18 at 20:15
  • @KyleKanos, please reconsider. I did not know that when I asked the question, and it shouldn't be penalized due to the nature of the correct answer. And seeing as it has 70+ upvotes, the community has agreed it is a good question. – chharvey Dec 17 '18 at 20:18
  • @chharvey it's not the "nature of the correct answer" here, it's the nature of the question. You're asking why people can't do X, but that's not a physics question, that's a biological one, therefore it's off-topic (and answers do not make a question on- or off-topic, the question alone does that). My guess is that this hit Hot Network Question and that's why there's a ridiculous amount of rep accrued for such a (IMO) worthless question. – Kyle Kanos Dec 17 '18 at 20:29

5 Answers5

75

This is because of the physiological differences in the functioning of the cochlea (for hearing) and the retina (for color perception).

The cochlea separates out a single channel of complex audio signals into their component frequencies and produces an output signal that represents that decomposition.

The retina instead exhibits what is called metamerism, in which only three sensor types (for R/G/B) are used to encode an output signal that represents the entire spectrum of possible colors as variable combinations of those RGB levels.

niels nielsen
  • 92,630
  • 4
    This is the only answer so far that correctly focuses on the role of the cochlea. This is a better answer than the accepted answer. –  Dec 01 '18 at 14:52
  • I agree that this answer is more technically correct, but I think it’s missing the key point: that our ears are able to sense mechanical waveforms while our eyes cannot sense electromagnetic waveforms. There’s room for improvement, which I welcome. – chharvey Dec 02 '18 at 00:24
  • 19
    In short, the reason "it could be said that we’re able to decompose a Fourier Series of sound" is because that's exactly what the cochlea does. – Mark Dec 02 '18 at 05:36
  • 3
    Exactly. quite a device- until it starts to fail, as mine have! – niels nielsen Dec 02 '18 at 05:52
  • I think it's worth mentioning that just like with vision, we ultimately don't hear with our ears but with our brains, and the ear-brain system can be fooled too https://en.wikipedia.org/wiki/Auditory_masking – whatsisname Dec 03 '18 at 17:03
  • 2
    @chharvey No, you cannot "sense mechanical waveforms" with your ear. All you sense is a bunch of frequencies, and different waveforms happen to have different amounts of harmonics in their Fourier transform. The phases of the different acoustic frequencies are not sensed by your ears, and thus, there is always a multitude of different waveforms than sound exactly the same. – cmaster - reinstate monica Dec 03 '18 at 23:05
  • Further, just claiming that the ear senses Fourier series or a Fourier transform doesn't really make any sense. The ear can distinguish changes in sound in very short time scale, but taken literally a Fourier transform would mean what you hear doesn't depend on time. – JiK Dec 04 '18 at 17:41
  • @JiK: it's a simplication. – whatsisname Dec 07 '18 at 02:38
  • @whatsisname The comments here have claimed that this is exactly what the ear cochlea does, and corrected someone by saying that we can't distinguish phases, only frequencies. A simplification is good as long as it's not taken literally, which these comments seem to have. Finally, using the simplification in a Fourier transform course should be done with great care, because that will potentially confuse students more than help, as it sounds like the ear can "find the Fourier transform of this recording f(t) at t=10s" which is not a thing. – JiK Dec 07 '18 at 10:47
44

Our sensory organs for light and sound work quite differently on a physiological level. The eardrum directly reacts to pressure waves while the photoreceptors on the retina are only senstive to a narrow range around the frequencies associated with red, green and blue. All light frequencies in between partly excite these receptors and the impression of seeing for example yellow arises due to the green and red receptors being exited with certain relative intensities. That's why you can fake out the color spectrum with only 3 different colors at each pixel of the display.

Seeing color in this sense is also more of a useful illusion than direct sensing of physical properties. Mixing colors in the middle of the visible spectrum retains a good approximation of the average frequency of the light mix. If colors from the edges of the spectrum are mixed, i.e. red and blue, the brain invents the color purple or pink to make sense of that sensory input. This however doesn't correspond to the average of the frequencies (which would result in a greenish color) nor does it correspond to any physical frequency of light. Same goes for seeing white or any shade of grey, as these correspond to all receptors being activated with equal intensity.

Mammal eyes also evolved in a way to distinguish intensity rather than color, since most mammals are nocturnal creatures. But I'm not sure if the ability to see in color was only established recently, that would be question for a biologist.

Halbeard
  • 754
  • awesome! BTW this could help answer your biological question: https://en.wikipedia.org/wiki/Diurnality#Evolution_of_diurnality – chharvey Dec 01 '18 at 03:34
  • 3
    Note that you cannot actually fake all the colors using only three primaries. Human-visible color gamut is not a triangle, so some colors will always be outside of output gamut of your display device. – Ruslan Dec 01 '18 at 07:25
  • 19
    Perhaps a nitpick, but it's not the eardrum that detects sound. It's more of a transmission device. The actual sensory organ is the cochlea https://en.wikipedia.org/wiki/Cochlea It's a spiral-shaped tube with sensory hairs along it. Sounds of a particular frequency vibrate the hairs at the spot in the cochlea where the sound resonates. So sound sensing is effectively continuous, while color sensing depends on the mix of the 3 color sensors. – jamesqf Dec 01 '18 at 19:01
  • Yes, I must admit that this part of the answer was a bit too handwavy. I was actually a bit surprised that this became the accepted answer after seeing the two that were posted before mine. It will be updated if it stays this way. – Halbeard Dec 01 '18 at 19:28
  • 5
    Actually, the photoreceptors are sensitive to quite large bands (compared to the distance of their peaks), even overlapping ones. – Paŭlo Ebermann Dec 01 '18 at 22:45
  • 2
    @HalberdRejoyceth, yes, please do update. I chose your answer because it hit the underlying point—that our ears sense true waveforms while our eyes do not. I found that to sufficiently answer my question, even if it’s not the complete truth. However, I do think you would benefit the community to explain in further detail the differences in how the cochlea and the retina work. – chharvey Dec 02 '18 at 00:19
  • 2
    Do you have any source for your claim that most mammals are nocturnal? While we assume they (we) were during the high time of the dinosaurs, is this still the case? – phresnel Dec 03 '18 at 10:58
  • @Ruslan like https://en.wikipedia.org/wiki/File:CIExy1931_srgb_gamut.png ? – Baldrickk Dec 03 '18 at 16:53
  • @Baldrickk yes, but even for better monitor gamuts than sRGB, e.g. Rec. 2020 we still have plenty of out-of-gamut colors. – Ruslan Dec 03 '18 at 17:00
  • 1
    @phresnel It's commonly known, see : Nocturnal Bottleneck. Mammals that are not properly nocturnal (which are about 70% of all mammals) are generally either crepuscular or cathemeral. Humans are among the minority of diurnal mammals, and along with the higher primates also have unusually superior colour vision - most other mammals have poor colour vision and lower acuity, both sacrificed for much better night vision, something humans do quite poorly. – J... Dec 03 '18 at 17:35
  • 1
    @PaŭloEbermann that's why we can perceive colors other than RGB; if the responses were super narrow, secondary colors (e.g. orange, yellow, cyan) would be practically invisible to us, or would just register as their "nearest neighbor" color (orange->red, yellow->green, cyan->green or blue, etc).

    By having a wider sensitivity range, we can pick up other colors with only these three cell types (and that gives us the ability to trick ourselves into perceiving those secondary colors on an RGB display).

    – Doktor J Dec 03 '18 at 19:09
  • I thought the reason why purple is perceived as a mix of red and blue is that its frequency is close enough to double that of red light that it stimulates the "red" cones (although to a lesser extent) along with the "blue" cones. So a mix of blue and a little red does the same thing to our eyes. – Monty Harder Dec 03 '18 at 19:31
  • @J...: Thanks for the url, but I already know of the bottleneck ("While we assume they (we) were during the high time of the dinosaurs"). The actual question was "is this still the case?". However, I found a source: "“Most mammals today are nocturnal and possess adaptations to survive in dark environments,” study co-author Roi Maor of the Tel Aviv University said." – phresnel Dec 04 '18 at 08:16
  • @phresnel +1 for knowing it all, -1 for knowing less than you think. The third sentence in the wiki article I linked reads : While some mammal groups have later evolved to fill diurnal niches, the approximately 160 million years spent as nocturnal animals has left a lasting legacy on basal anatomy and physiology, and most mammals are still nocturnal. – J... Dec 04 '18 at 10:50
  • @MontyHarder I think Halberd misspoke/mislabeled the color? Violet is in the rainbow, it's a real color and acts like you describe. But Mangenta which looks "purplish pink" is invented. It doesn't exist in the rainbow and if you wrap the rainbow it is in the "invisible section" opposite green. I believe Purple is defined like Violet and is also in the "real range" versus the "imaginary range". – Black Dec 04 '18 at 13:02
  • To be fair, in highschool we had am unresolved debate for an hour on whether Magenta was mostly Purple, Pink, Red, Blue, or White. So clearly having an imaginary color kinda muddies the waters of all colors that look similar :d – Black Dec 04 '18 at 13:05
21

This is due mostly to physiology. There is a fundamental difference in the way we perceive sound vs. light: For sound we can sense actual waveform, whereas for light we can sense only the intensity. To elaborate:

  • Sound waves entering your ear cause synchronous vibrations in your cochlea. Different regions of the cochlea have tiny hairs which vibrate in a frequency-selective way. The vibrations of these hairs are turned into electrical signals which are passed on to the brain. Due to the frequency selectivity of the hairs, the cochlea essentially performs a Fourier transform, which is why we can perceive superpositions of waves.
  • Light has such a high frequency that almost nothing can resolve the actual waveform (even state of the art electronics nowadays cannot do this). All we can effectively measure is the intensity of the light, and this is all that the eyes can perceive as well. Knowing the intensity of a light beam is not sufficient to determine its spectral content. E.g. a superposition of two monochromatic waves can have the same intensity as a pure monochromatic wave of a different frequency.

    We can differentiate superpositions of light in a limited way, due to the fact that eyes perceive three separate color channels (roughly RGB). This is why we can distinguish equal intensities of red and blue light. People with colorblindness have a defective receptor, and so color combinations that most humans can distinguish appear identical to them.

    Not all colors that we perceive correspond to a color of a monochromatic light wave. Famously, there is an entire "line of purples" which do not represent any monochromatic light wave. So for people trained in distinguishing purple colors, they can actually differentiate superpositions of light waves in a limited way.

    enter image description here

Yly
  • 3,654
  • 6
    "...electrical signals representing the actual waveform of the sound. The brain ... does a Fourier transform..." This part of your answer is unfortunately incorrect. The decomposition into different audio frequencies happens mechanically in the cochlear before any vibrations are turned into nerve signals. So the actual waveform is not send to the brain. – Emil Dec 01 '18 at 08:56
  • @Emil Do you have a reference for that? I'm not an expert, so I would happily revise my answer with better information, but my understanding is that the eardrum passes sound waves into the fluid of the cochlea, which cause stereocilia in the organ of Corti to vibrate, which in turn mechanically activate certain neurotransmitter channels. It's described on the Wikipedia page for organ of Corti. I see no reference to frequency discrimination in the cochlea. – Yly Dec 01 '18 at 19:26
  • 2
    @Yly Emil is correct; the cochlea does the Fourier transform, mechanically. See http://www.cochlea.eu/en/cochlea/function – zwol Dec 02 '18 at 00:23
  • 1
    @zwol Thanks. I have corrected the answer accordingly. – Yly Dec 02 '18 at 00:46
  • 3
    I'm not sure about your 2nd point. Surely a simple spectrograph does a good job of resolving light frequencies? But eyes are arranged primarily for spatial discrimination, rather than frequency, like the ear. If we wanted one organ to do both, it'd need far more sensors: each rod/cone in an eye would need a separate neuron for each frequency band you want to discriminate. – jamesqf Dec 02 '18 at 16:50
  • @jamesqf it's not about resolving frequencies – it's about resolving waveforms. Try to record a waveform of ~550 THz signal, you'll see how "simple" it is. – Ruslan Dec 03 '18 at 10:01
  • The ear does not sense the actual waveform, it only detects the magnitudes of the frequencies. The phase is not detected. As such, it's impossible to reconstruct a waveform from what is signaled to the brain. All waveforms with the same overtone amplitudes sound exactly the same. – cmaster - reinstate monica Dec 03 '18 at 23:10
16

Rod (1 type) plus cone (3 types) neurons in the eye give you the potential for 4-D sensation. Since the rod signal is nearly redundant to the totality of cone signals, this is effectively a 3-D sensation.

Cochlear (roughly 3500 "types" simply due to 3500 different inner hair positions) neurons in the ear give you the potential for 3500-D sensation, so trained ears can potentially recognize the simulatenous amplitudes from thousands of frequencies.

So, to answer your question, eyes simply didn't evolve to have many cone types. An improvement, however, is seen through the eyes of mantis shrimp (with the potential for 16-D sensation). Notice the trade-off between spatial image resolution and color perception (and that audio spatial resolution was less important in evolution, and more difficult due to the longer wavelength).

bobuhito
  • 1,016
6

The hairs form a 1D-array along the frequency axis, while rods and rods and cones form a spatial 2D array. In addition, that 2D array has 4 channels (rods and 3 types of cones). So the 2 ears have a poor spatial resolution, while the eyes have poor frequency resolution.

You could imagine an eye with many more types of cones, giving you a better frequency resolution. However, that would mean that the cones for a single color would be spaced further apart, limiting spatial resolution. In the end, that's an evolutionary trade-off. Physics tells us you can't have both at the same time, but biology is why we end up with this particular outcome.

MSalters
  • 5,574