2

I have just presented a project of mine regarding sound recognition using neural networks. I told during the presentation that I decided to only recognize one sound (musical notes coming from a guitar) at any point in time, explaining that recognizing multiple sounds is very hard or impossible.

I apply FFT on the soundwave and feed that into a neural network. My question is, if I were to record multiple sounds at once, couldn't the FFT data be exactly the same for different sounds?

I mean, if you combine $\sin(x)$ and $\sin(x+\pi)$ you get a wave that is a straight line. You also get a straight line for $\sin(2x)$ and $\sin (2x + \pi/2)$. So, two different set of waves give the same combined wave, that will have the same FFT data.

But I got confused because the professor said that you can always decompose the sound into its original elements (e.:. take a chord that has 6 different notes played individually) because the data is in time-domain. Can you really always decompose a sound wave in its components? What about the case above? He further explained that in "space-domain" you can have the problem I mentioned above, but in the time-domain not?

PS: I don't know if this is the right Stack Exchange site to ask this question on.

Emilio Pisanty
  • 132,859
  • 33
  • 351
  • 666
Cristy
  • 121

1 Answers1

1

Most sounds (even the sound of a "single note") contain multiple frequencies. For pure sounds, there is the fundamental frequency and its harmonics, but almost any "real" sound contains some additional components - due to the envelope of the sound (e.g. the fact that a string must be plucked, then decays) or due to sampling (your sample is finite - so there will be some effects due to this truncation of the sound wave).

When you do a Fourier Transform, you will see all the frequency components. It is quite easy to detect the pitch of multiple strings struck at the same time in, say, a piano chord. Now depending on your definition of "recognizing" different sounds, it may be tough to detect two notes that are sounded simultaneously but an octave apart (so that one sound is in essence covering the harmonics of the other). The ear is remarkably good at picking up the difference - for example, we can tell that the higher note is not exactly in phase, and that the amplitude of the second harmonic (the fundamental of the higher note) is higher than it would be if it was just the second harmonic of the lower note.

Of course if you play two notes of the same frequency and phase, then you will not be able to tell them apart (except by the fact that the amplitude will be larger than you expect from a "single note", perhaps). But "real" sounds, of musical instruments, will have sufficiently unique signatures that you can "see" them.

You might be interested in an answer I wrote a while ago where I used a simple iPhone app to analyze the spectrum of the sound of a coin being dropped; this shows that there are multiple harmonics for different sounds, and that you can most likely see (by looking at the combined spectrum) the sum of these two sounds - and tell them apart. It would actually be a fun application of your work to detect "how much money did I drop?".

Floris
  • 118,905
  • The thing that confused me is related to this: It is quite easy to detect the pitch of multiple strings struck at the same time in, say, a piano chord. I knew that you could detect entire chords and then based on musical knowledge get the individual notes played. The professor said that there are software that recognize the individual notes that were played in a chord. I tend to believe that that software only recognizes the chord and then using a dictionary retrieves the notes that form that chord. I bet that if you play a chord that is not used in music it won't recognize the notes. – Cristy Jul 18 '16 at 13:50
  • 1
    Your professor is right; you don't need a dictionary to extract the chord from the FFT. You might want to look at this answer and links therein. (The dsp stack exchange has a number of hits on "chord recognition" as well...) – Floris Jul 18 '16 at 13:54
  • 1
    @Cristy, try it with the system you developed for your project. Play a single note and look at the FFT result. Then play a chord and look at the FFT. What you should see is that the FFT makes the individual notes of the chord very clear. – The Photon Jul 18 '16 at 14:08
  • 1
    "I tend to believe that that software only recognizes the chord and then using a dictionary retrieves the notes that form that chord" That doesn't make sense to me. There are a literally a hundred (and probably more than that) different "sets of notes" that are the same "chord" (say C major) on a piano. How would your "dictionary" deal with that? – alephzero Jul 18 '16 at 14:13
  • One of the things you could try is this - play an arpeggio on the guitar, and detect (in time domain) the moment each note is struck; then take an FFT "just before" and "just after" - and look at the difference, which will be caused by the new note just added. That would powerfully improve your algorithm in a first step to full chord recognition. Also look at comb filters which remove a note and its harmonics, as a first step to making this task easier. – Floris Jul 18 '16 at 14:16
  • I'm using a neural network so I don't actually want all those filters, I only have a basic band-pass filter for noise reduction (I want the network to handle of the "processing"). I haven't yet tried with multiple notes at once at is was hard for me to record the training data. I was just wondering, if it's that easy to detect individual notes why there's almost no software doing it (correctly) ? Even RockSmith, with their RealTone cable has issues sometimes or considers you played correctly a note you actually didn't. – Cristy Jul 18 '16 at 14:36
  • See also http://stackoverflow.com/q/4337487/1967396 . I think that using a neural network without at least trying to inject "some" analytical thinking first is making the problem harder than it needs to be. You are already taking an FFT, which is a first step of analysis - so you are injecting some knowledge about the physics. Why not add a bit more smarts before letting your AI loose... – Floris Jul 18 '16 at 14:48