I've also been going through (the English translation of) Einstein's original 1905 paper "On the Electrodynamics of Moving Bodies". In the context of his paper, the derivation is actually quite straightforward. You only need Maxwell's equations, the Lorentz transformation (which Einstein derived in the preceding segment of the paper), and the chain rule of partial derivatives for composite multivariable functions.
In his paper, Einstein considered two frames of reference K and $k$. K is the (relatively) stationary frame, while $k$ is moving relative to K at velocity $v$ in linear uniform motion. Co-ordinates in K are expressed as ($x$, $y$, $z$, $t$), and in $k$ as ($\xi$, $\eta$, $\zeta$, $\tau$). For simplicity, Einstein assumed that at $t = 0, \tau = 0$, and the origins of both K and $k$ overlapped at $t = \tau = 0$. To further simplify things, Einstein assumed that $k$ and K had parallel axes (and assumed that the axes were in the right-hand configuration), and $k$ is moving along the X axis of K.
From these above assumptions, Einstein derived the Lorentz transformation to convert the co-ordinates ($x$, $y$, $z$, $t$) to ($\xi$, $\eta$, $\zeta$, $\tau$):
$$\textit{$\xi$} = \textit{$\beta$} (\textit{x} - \textit{vt})$$
$$\textit{$\eta$} = \textit{y}$$
$$\textit{$\zeta$} = \textit{z}$$
$$\textit{$\tau$} = \textit{$\beta$} (\textit{t} - \textit{vx}/ \textit{c}^2)$$
where
$$\textit{$\beta$} = 1 / \sqrt{1 - \textit{v}^2/\textit{c}^2}$$
From these equations we can easily derive the reverse transformation from ($\xi$, $\eta$, $\zeta$, $\tau$) to ($x$, $y$, $z$, $t$):
\begin{align*}
x &= \beta (\xi + v \tau)\label{a}\tag{1}\\y &= \eta\label{b}\tag{2}\\z &= \zeta\label{c}\tag{3}\\t &= \beta (\tau + v \xi/ c^2)\label{d}\tag{4}
\end{align*}
In the electrodyamical section of Einstein's paper, we now consider a (classical) electromagnetic field in empty space:
Let $\vec{\mathrm{E}}$($x$, $y$, $z$, $t$) be the vector function describing the electric field in frame K. Likewise, let $\vec{\mathrm{B}}$($x$, $y$, $z$, $t$) describe the magnetic field in frame K. $\vec{\mathrm{E}}$ = (X, Y, Z), where X, Y, Z are scalar functions describing the vector component in the x, y, z axes respectively. Similarly, $\vec{\mathrm{B}}$ = (L, M, N), where L, M, N are scalar functions of the component in the x, y, z axes respectively.
Maxwell's equations in free space are:
$$\nabla \cdot \vec{\mathrm{E}} = \frac{\textit{$\rho$}}{ \textit{$\epsilon$}_0}$$
$$\nabla \cdot \vec{\mathrm{B}} = 0$$
$$\nabla \times \vec{\mathrm{E}} = - \frac{\partial\vec{\mathrm{B}}}{\partial\textit{t}}$$
$$\nabla \times \vec{\mathrm{B}} = \textit{$\mu$}_0 (\vec{\mathrm{J}} + \textit{$\epsilon$}_0 \frac{\partial\vec{\mathrm{E}}}{\partial\textit{t}})$$
Where $\rho$ is the electric charge density, $\vec{\mathrm{J}}$ is the electric current density, $\mu_0$ is the permeability of free space, and $\epsilon_0$ is the permittivity of free space.
Taking advantage of the fact that in empty space, $\rho = 0$ and $\vec{\mathrm{J}} = 0$, and switching over to Guassian Units (which is what I suspect Einstein did), we arrive at the following equations:
\begin{align*}
\nabla \cdot \vec{\mathrm{E}} &= 0\label{e}\tag{5}\\\nabla \cdot \vec{\mathrm{B}} &= 0\label{f}\tag{6}\\\nabla \times \vec{\mathrm{E}} &= - \frac{1}{c}\frac{\partial\vec{\mathrm{B}}}{\partial\ t}\label{g}\tag{7}\\
\nabla \times \vec{\mathrm{B}} &= \frac{1}{c} \frac{\partial\vec{\mathrm{E}}}{\partial t}\label{h}\tag{8}
\end{align*}
Apply equation (\ref{g}) to $\vec{\mathrm{E}}$($x$, $y$, $z$, $t$) and $\vec{\mathrm{B}}$($x$, $y$, $z$, $t$), and we get:
\begin{align*}
\frac{1}{c}\frac{\partial \mathrm{X}}{\partial t} &= \frac{\partial \mathrm{N}}{\partial y} - \frac{\partial \mathrm{M}}{\partial z}\label{xnm}\tag{9}\\\frac{1}{c}\frac{\partial \mathrm{Y}}{\partial t} &= \frac{\partial \mathrm{L}}{\partial z} - \frac{\partial \mathrm{N}}{\partial x}\label{yln}\tag{10}\\\frac{1}{c}\frac{\partial \mathrm{Z}}{\partial t} &= \frac{\partial \mathrm{M}}{\partial z} - \frac{\partial \mathrm{L}}{\partial y}\label{zml}\tag{11}
\end{align*}
Similarly, apply equation (\ref{h}) to $\vec{\mathrm{E}}$($x$, $y$, $z$, $t$) and $\vec{\mathrm{B}}$($x$, $y$, $z$, $t$), and we get:
\begin{align*}
\frac{1}{c}\frac{\partial \mathrm{L}}{\partial t} &= \frac{\partial \mathrm{Y}}{\partial z} - \frac{\partial \mathrm{Z}}{\partial y}\label{lyz}\tag{12}\\\frac{1}{c}\frac{\partial \mathrm{M}}{\partial t} &= \frac{\partial \mathrm{Z}}{\partial x} - \frac{\partial \mathrm{X}}{\partial z}\label{mzx}\tag{13}\\\frac{1}{c}\frac{\partial \mathrm{N}}{\partial t} &= \frac{\partial \mathrm{X}}{\partial y} - \frac{\partial \mathrm{Y}}{\partial z}\label{nxy}\tag{14}
\end{align*}
These relations hold in frame K. (These equations also assume that the x, y, z axes of frame K are in the right-hand configuration)
Now, we consider how the vector functions $\vec{\mathrm{E}}$($x$, $y$, $z$, $t$) and $\vec{\mathrm{B}}$($x$, $y$, $z$, $t$) are expressed in frame $k$ using the co-ordinates ($\xi$, $\eta$, $\zeta$, $\tau$). This can be accomplished by substituting $x$, $y$, $z$, $t$ with $\xi$, $\eta$, $\zeta$, $\tau$ using equations (\ref{a}) - (\ref{d}).
Thus we get:
\begin{align*}
\vec{\mathrm{E}}(x, y, z, t) &= \vec{\mathrm{E}}(\beta (\xi + v\tau), \eta, \zeta, \beta (\tau + v\xi/ c^2))\label{Etrans}\tag{15}\\\vec{\mathrm{B}}(x, y, z, t) &= \vec{\mathrm{B}}(\beta (\xi + v\tau), \eta, \zeta, \beta (\tau + v\xi/ c^2))\label{Btrans}\tag{16}
\end{align*}
Using the chain rule of composite multivariable functions:
$$ \frac{\partial}{\partial x}f(u(x, t), v(x, t)) = \frac{\partial f}{\partial u} \frac{\partial u}{\partial x} + \frac{\partial f}{\partial v} \frac{\partial v}{\partial x}$$
We apply it to any general continuous function
$$\mathrm{F}(x, y, z, t) = \mathrm{F}(\beta (\xi + v\tau), \eta, \zeta, \beta (\tau + v\xi/c^2))$$
Thus
$$\frac{\partial \mathrm{F}}{\partial\xi} = \frac{\partial\mathrm{F}}{\partial x} \frac{\partial x}{\partial\xi} + \frac{\partial\mathrm{F}}{\partial y} \frac{\partial y}{\partial\xi} + \frac{\partial\mathrm{F}}{\partial z} \frac{\partial z}{\partial\xi} + \frac{\partial\mathrm{F}}{\partial t} \frac{\partial t}{\partial\xi}$$
According to equations (\ref{a}) - (\ref{d}), $\frac{\partial x}{\partial \xi} = \beta$, $\frac{\partial y}{\partial\xi} = 0$, $\frac{\partial z}{\partial\xi} = 0$, $\frac{\partial t}{\partial\xi} = \beta \frac{v}{c^2}$, we get:
$$\frac{\partial\mathrm{F}}{\partial\xi} = \beta (\frac{\partial\mathrm{F}}{\partial x} + \frac{v}{c^2} \frac{\partial\mathrm{F}}{\partial t})$$
This relation ship holds for any general continuous function, thus:
\begin{equation}
\frac{\partial}{\partial\xi} = \beta (\frac{\partial}{\partial x} + \frac{v}{c^2} \frac{\partial}{\partial t})\label{i}\tag{17}
\end{equation}
Similarly:
\begin{align*}
\frac{\partial}{\partial\eta} &= \frac{\partial}{\partial y}\label{j}\tag{18}\\\frac{\partial}{\partial\zeta} &= \frac{\partial}{\partial z}\label{k}\tag{19}\\\frac{\partial}{\partial\tau} &= \beta (\frac{\partial}{\partial t} + v \frac{\partial}{\partial x})\label{l}\tag{20}
\end{align*}
From equation (\ref{l}), we get
\begin{equation}
\frac{\partial}{\partial t} = \frac{1}{\beta} \frac{\partial}{\partial\tau} - v \frac{\partial}{\partial x}\label{m}\tag{21}
\end{equation}
Plug equation (\ref{m}) into equation (\ref{i}), we get
\begin{equation}
\frac{\partial}{\partial x} = \beta (\frac{\partial}{\partial\xi} - \frac{v}{c^2} \frac{\partial}{\partial\tau})\label{n}\tag{22}
\end{equation}
Plug equation (\ref{n}) back into equation (\ref{m}), we get
\begin{equation}
\frac{\partial}{\partial t} = \beta (\frac{\partial}{\partial\tau} - v \frac{\partial}{\partial\xi})\label{o}\tag{23}
\end{equation}
Now we have all the tools we need to derive the transformation of $\vec{\mathrm{E}}$($x$, $y$, $z$, $t$) and $\vec{\mathrm{B}}$($x$, $y$, $z$, $t$) from frame K to frame $k$:
Apply equation (\ref{m}) to equation (\ref{xnm}):
$$\frac{1}{c} \frac{1}{\beta} \frac{\partial\mathrm{X}}{\partial\tau} - \frac{v}{c} \frac{\partial\mathrm{X}}{\partial x} = \frac{\partial \mathrm{N}}{\partial\eta} - \frac{\partial \mathrm{M}}{\partial\zeta}$$
From equation (\ref{e}):
$$ \frac{\partial\mathrm{X}}{\partial x} + \frac{\partial\mathrm{Y}}{\partial y} + \frac{\partial\mathrm{Z}}{\partial z} = 0$$
Thus
$$ -\frac{\partial\mathrm{X}}{\partial x} = \frac{\partial\mathrm{Y}}{\partial\eta} + \frac{\partial\mathrm{Z}}{\partial\zeta}$$
Then it follows that
$$\frac{1}{c} \frac{1}{\beta} \frac{\partial\mathrm{X}}{\partial\tau} + \frac{v}{c} \frac{\partial\mathrm{Y}}{\partial\eta} + \frac{v}{c} \frac{\partial\mathrm{Z}}{\partial\zeta} = \frac{\partial \mathrm{N}}{\partial\eta} - \frac{\partial \mathrm{M}}{\partial\zeta}$$
Rearrange the terms, and we arrive at
\begin{equation}
\frac{1}{c}\frac{\partial\mathrm{X}}{\partial\tau} = \frac{\partial}{\partial\eta}[\beta(\mathrm{N} - \frac{v}{c}\mathrm{Y})] - \frac{\partial}{\partial\zeta}[\beta(\mathrm{M} + \frac{v}{c}\mathrm{Z})]\tag{24}
\end{equation}
Apply equation (\ref{n}) and (\ref{o}) to equation (\ref{yln}):
$$\frac{1}{c}\beta(\frac{\partial\mathrm{Y}}{\partial\tau} - v \frac{\partial\mathrm{Y}}{\partial\xi}) = \frac{\partial \mathrm{L}}{\partial\zeta} - \beta (\frac{\partial\mathrm{N}}{\partial\xi} - \frac{v}{c^2} \frac{\partial\mathrm{N}}{\partial\tau})$$
Rearrange the terms, and we arrive at
\begin{equation}
\frac{1}{c}\frac{\partial}{\partial\tau}[\beta (\mathrm{Y} - \frac{v}{c}\mathrm{N})] = \frac{\partial \mathrm{L}}{\partial\zeta} - \frac{\partial}{\partial\xi}[\beta (\mathrm{N} - \frac{v}{c}\mathrm{Y})]\tag{25}
\end{equation}
Likewise, apply equation (\ref{n}) and (\ref{o}) to equation (\ref{zml}), and rearrange the terms, we obtain:
\begin{equation}
\frac{1}{c}\frac{\partial}{\partial\tau}[\beta (\mathrm{Z} + \frac{v}{c}\mathrm{M})] = \frac{\partial}{\partial\xi}[\beta (\mathrm{M} + \frac{v}{c}\mathrm{Z})] - \frac{\partial \mathrm{L}}{\partial\eta}\tag{26}
\end{equation}
We then go through the exact same process, also utilizing the fact that from equation (\ref{f}) we have the relation:
$$ \frac{\partial\mathrm{L}}{\partial x} + \frac{\partial\mathrm{M}}{\partial y} + \frac{\partial\mathrm{N}}{\partial z} = 0$$
Thus we derive
\begin{align*}
\frac{1}{c}\frac{\partial\mathrm{L}}{\partial\tau} = \frac{\partial}{\partial\zeta}[\beta(\mathrm{Y} - \frac{v}{c}\mathrm{N})] - \frac{\partial}{\partial\eta}[\beta(\mathrm{Z} + \frac{v}{c}\mathrm{M})]\tag{27}\\\frac{1}{c}\frac{\partial}{\partial\tau}[\beta (\mathrm{M} + \frac{v}{c}\mathrm{Z})] = \frac{\partial}{\partial\xi}[\beta (\mathrm{Z} + \frac{v}{c}\mathrm{M})] - \frac{\partial \mathrm{X}}{\partial\zeta}\tag{28}\\\frac{1}{c}\frac{\partial}{\partial\tau}[\beta (\mathrm{N} - \frac{v}{c}\mathrm{Y})] = \frac{\partial \mathrm{X}}{\partial\eta} - \frac{\partial}{\partial\xi}[\beta (\mathrm{Y} - \frac{v}{c}\mathrm{N})]\tag{29}
\end{align*}
where
$$\beta = 1 / \sqrt{1 - v^2/c^2}$$