In Landau & Lifshitz' book, I got stuck into this claim that the momentum is the derivative of the action as a function of coordinates i.e. $$ \begin{equation}p_i = \frac{\partial S}{\partial x_i}\tag{1}\end{equation} $$ where $i$ indicates the $i$-th component of the vector (does not stand for initial). As far as I understand, this seems to imply that you can derive the action wrt coordinates to get the momentum at any time of the motion. However, I can't fully understand the proof of this claim.
I read this answer that clarified me what is on-shell action, and I understood that $S$ here is a function of only initial and final time and position, and not the function of a curve.
I tried then to read the proof of the Lemma in this answer to get the proof of this claim, but I can't understand the actual steps. If I understand correctly, this is more or less the way that Landau proves it, but I am much more familiar with ordinary calculus than with calculus of variation, so some points are obscure to me.
As far as I understand, what is intended by $\delta I$ is actually $$ \delta I = \left.\frac{d}{d\varepsilon}\right|_{\varepsilon = 0}I[q_{cl}+\varepsilon \eta] $$ where $q_{cl}$ is a path that satisfies the equation of motion, $\eta$ is an arbitrary function (commonly denoted by $\delta q$), and $$I[q]=\int_{t_1}^{t_2}L(q(t),\dot q(t))dt$$ is the action integral. Following this definition of $\delta I$, after some calculations (which, as far as I understand, are the same as the proof of the Lemma) I get to $$ \delta I=p(t_2)\eta(t_2)-p(t_1)\eta(t_1) \tag{2} $$ which I realize is just a pedantic notation for $$\delta S=p_2 \delta q_2 - p_1 \delta q_1$$ but I can't understand how to turn this result into a derivative of $S$, since I still can't see the variable with respect to which I should perform the derivative. I would say that $$ \frac {\partial \delta S} {\partial \delta q_2}=p_2$$ but I guess that is not what is intended with (1).
I tried to follow an alternative way, by calculating
$$ \left .\frac d {dx}\right | _{x=0} \int_{t_1}^{t_2} L(q_{cl}(t)+x,\dot q_{cl}(t))dt=p(t_2)-p(t_1) $$ but I can't get rid of the difference.
What am I doing wrong? Is there any proof with explicit mathematical steps, or any book I can read to understand how to get explicitly from (2) to (1)?