First off, it's easier perhaps to argue for why the time evolution has to be unitary. The reason for this is that a unitary operator is precisely the kind of operator that is both invertible and which preserves vector norms. In particular, it preserves the "sum of all probabilities for the potential values of every physical parameter is 1" property which you sort of need by definition of how probability works, along with making sure that the physics of an isolated system is reversible.
That there then is some operator $\hat{H}$ such that $\hat{U}(t) = e^{i\hat{H} t}$ just follows then from the fact you can take operator logarithms (at least in suitably "well-behaved" cases.).
Now as for why $\hat{H}$ is a Hamiltonian. First off we need to understand what a Hamiltonian "is" - and to do that we need to understand the concept of a "generator of translation".
Suppose you have a function $f$ - not necessarily a wave function, this is just any function with $\mathrm{dom}(f) = \mathbb{R}$. Require $f$ to be differentiable. Then define
$$T_{\Delta x}[f] := x \mapsto f(x - \Delta x)$$
where we've used anonymous function notation to describe an operator (higher-order function) $T_{\Delta x}$ called the translation operator by $\Delta x$.
Now we know that by definition of the derivative, when $\Delta x \approx 0$,
$$(T_{\Delta x}[f])(x) \approx f(x) - \Delta x\ f'(x)$$
In particular, writing via the differential operator $D$,
$$T_{\Delta x}[f] \approx f - \Delta x\ D[f]$$
and thus in this regard we can say $D$ is the generator of infinitesimal translations, i.e. where we informally imagine $\Delta x$ to become $dx$, as in the usual sense of calculus.
Admittedly, it's kind of an odd usage of terminology or, at least, I've thought so, because to me it'd be more logical to imagine the fictitious $T_{dx}$ as a generator, since hte term generator in other mathematical contexts basically means "an element or elements you can use to reach all other elements of a structure by suitable repeated use of the structure's operations", such as in abstract algebra when we consider a cyclic group $G$ we call the primitive element $g$ the generator of $G$ because every element in the group can be described as $g^n = \underbrace{g \circ g \circ \cdots \circ g}_{\text{$n$ copies of "g"}}$. Ostensibly since "compositionally integrating" $T_{dx}$, would be more like this than adding up a bunch of applications of $D$, I'd say the former is what should be called the generator of the translations, but meh; that's how it is - the trick is this is what the term "generator of translations" means.
And so we can talk of a generator of translations for wave functions now $\psi_x : \mathbb{R} \rightarrow \mathbb{C}$. In this case, though, we get even more weird and decide to include a factor $i$ in and say $-iD$, which becomes upon introduction of units $-i\hbar D$, as the generator of the translations and name it as momentum operator, $\hat{p}$.
The trick here now is this: what about time translations? Well, it turns out we can do the same thing: $iD$ will work perfectly okay as a generator of time translation in the same way when applied to a time series. And in fact, you see this very operator on the right hand side of the Schrodinger equation:
$$\hat{H} [|\psi\rangle(t)] = i \frac{d|\psi\rangle}{dt}$$
The problem is that, at least in non-relativistic quantum mechanics, time $t$ is not an observable operator(*), so the right hand side is not a bona fide operator in the Hilbert space of quantum vectors.
Now: $i \frac{d}{dt}$, or $i D$ if you prefer, applied to a time series is(**) a generator of temporal translations in the sense we have mentioned. But, because of the above, it's not an operator! And that's where Hamiltonians come in: A Hamiltonian is an operator which acts like $i \frac{d}{dt}$ but which is a bona fide Hilbert-space operator. And since a vector in Hilbert space gives one instant only, necessarily the Hamiltonian must literally "generate" the time evolution in that it must derive it from the present state.
Going back to our discussion of generators of translation, that means
$$|\psi\rangle(t + dt) = |\psi\rangle(t) - i [\hat{H} |\psi\rangle(t)]$$
and it should not be hard to see that since $|\psi\rangle(t + dt)$ also equals $\hat{U}^{dt} |\psi\rangle(t)$, and since $\hat{U}(dt) |\psi\rangle(t) = e^{i\hat{H} dt} |\psi\rangle(t)$, a quick Taylor expand followed by a quick use of Leibniz's "transcendental law of homogeneity", immediately produces the above.
TL;DR: The physical reason is evolution has to be unitary, so that probabilities keep making sense and that isolated systems' dynamics are fully reversible. The first is a mathematical necessity; the second is an empirical fact. And a Hamiltonian, by definition, is a bona fide Hilbert operator that can stand in the shoes of the "pseudo-operator" acting on histories, that we call the generator of temporal translation in itself. And using the mathematical definition of what that means, we see that the factor in the exponential must then be identified as that Hamiltonian.
(*) I hypothesize that this is because we should more "correctly" ascribe the parameter $t$ in the Schrodinger equation not to some externally-observable clock, but to the "inner subjective sense" of the agent which the the quantum theory is implicitly taking the viewpoint of.
(**) It follows from (*) that we should then reverse the sign, because the agent is in effect what is "actively" moving from past into future, hence we must translate the history the opposite direction. This is similar to the distinction between active and passive transformations and also - can't help but bring in some of my other fields of study here - conceptual temporal metaphors in human language; where linguists would talk of the "moving events" vs. "moving ego" perspective of conceptualizing the "flow of time". $-iD$ is the active-transformation or moving-events viewpoint, $iD$ is the passive-transformation or moving-ego viewpoint.