Linear algrebra offers an intuitive explanation. Recall, that for two normalized vectors in $\mathbb R^2$,
\begin{equation}
\overline v = \begin{pmatrix}
v_1 \\ v_2
\end{pmatrix},
\enspace
\overline w = \begin{pmatrix}
w_1 \\ w_2
\end{pmatrix}
\end{equation}
the inner product is related to the angle between the vectors $\overline v \cdot \overline w = \cos \theta$. Hence the inner product measures the similarity between $\overline v$ and $\overline w$. Alternatively, the inner product $\overline v\cdot \overline w$ expresses the projection of $\overline v$ onto $\overline w$.
This interpretation of the inner product is not restricted to $\mathbb R^2$ but generalizes to other vector spaces. In quantum mechanics a physical state is described by a state vector $| i \rangle$ which lives in some abstract Hilbert space $\mathcal H$. Again, we should interpret the inner product $\langle i|f\rangle$ as measuring the similarity between $|i\rangle$ and $|f\rangle$. Thus, it seems very natural that $|\langle i|f\rangle|^2$ is the probability of finding $|i\rangle$ in the state $|f\rangle$.
While the explanation is by no means a rigorous proof, I find this intuition very useful. I hope it helps you understand the topic.