The solution of any initial problem for a vibrating drum can be represented as a superposition of natural modes (i.e. eigen-modes). The corresponding eigen-values would be referred to as the natural frequencies. When you kick the drum you end up exciting one or several eigen-modes. The specific amplitudes of these modes depend on your initial condition (the way you kick the drum). However, realistic loading would tend to happen relatively slowly and, therefore, result in higher-frequency modes having smaller amplitudes. The resulting sound would normally be dominated by one of the lowest-frequency modes of the drum, with the higher-frequency modes contributing to the resulting timbre.
Suppose that you have a round drum. Its sound, for the majority of realistic initial conditions, will be dominated by its lowest frequency eigen-mode. Hence, someone trying to make an audibly-indistinguishable square drum will first have to match their lowest natural frequencies. However, there is absolutely no reason why this would also match their second and higher natural frequencies. Thus, such drums would have the same "dominant" tone, but different timbre, making them easily distinguishable even by untrained ear.
However, if you are allowed to change the shape in a wider range, say, consider rectangular drums, you will be able to match 2 (for rectangle), 3, and more lowest modes (for more complicated shapes). The resulting timbre of these drums can be made much closer to each other and, at some point, untrained, or even trained, ear will become unable to tell the difference.
Keep in mind that I implicitly assumed here that the we can always reproduce the specific distribution of the amplitudes of natural modes for both drums. This means, however, that to achieve the same sound you are likely to need to kick different drums in a different fashion.