You should think of quantities like $\langle B \vert A \rangle $ as describing probability amplitudes, where the probability is recovered by doing the "norm-squared" action $P(B\vert A) = \vert \langle B \vert A \rangle \vert^2 $. So $\vert \langle A \vert A \rangle \vert^2 = 1$ and resembles $P(A\vert A)$, not your best-guess equivalent $P(A)$.
Language like "happening" is kinda imprecise, but more importantly it dodges the big issue of uncertainty in QM. We might say that given an initial state $A$, the probability of measuring some condition/state $B$ after time $t$ is:
$$ P(B\vert A) = \vert \langle B \vert U(t) \vert A \rangle \vert^2 $$
but we didn't know what "happened" without the notion of some measurement (or distinguishable effect) of the time-evolved state (this is addressed in Kyle's link). Use of the time-evolution operator like this implies that $B$ is some observable feature that $A$ is at least partly composed of; i.e. $A$ and $B$ are not simply conditionally linked events.