I have just presented a project of mine regarding sound recognition using neural networks. I told during the presentation that I decided to only recognize one sound (musical notes coming from a guitar) at any point in time, explaining that recognizing multiple sounds is very hard or impossible.
I apply FFT on the soundwave and feed that into a neural network. My question is, if I were to record multiple sounds at once, couldn't the FFT data be exactly the same for different sounds?
I mean, if you combine $\sin(x)$ and $\sin(x+\pi)$ you get a wave that is a straight line. You also get a straight line for $\sin(2x)$ and $\sin (2x + \pi/2)$. So, two different set of waves give the same combined wave, that will have the same FFT data.
But I got confused because the professor said that you can always decompose the sound into its original elements (e.:. take a chord that has 6 different notes played individually) because the data is in time-domain. Can you really always decompose a sound wave in its components? What about the case above? He further explained that in "space-domain" you can have the problem I mentioned above, but in the time-domain not?
PS: I don't know if this is the right Stack Exchange site to ask this question on.
It is quite easy to detect the pitch of multiple strings struck at the same time in, say, a piano chord
. I knew that you could detect entire chords and then based on musical knowledge get the individual notes played. The professor said that there are software that recognize the individual notes that were played in a chord. I tend to believe that that software only recognizes the chord and then using a dictionary retrieves the notes that form that chord. I bet that if you play a chord that is not used in music it won't recognize the notes. – Cristy Jul 18 '16 at 13:50