How am I meant to differentiate the $F_{\mu\nu}$ on the first term of (2)? I don't understand how doing so will give me $F_{\mu\nu}$. (as explained by Lubos in the link)
The Lagrangian should be thought as a function of $4+16=20$ variables in this case, denoted as $y_\mu$ and $y_{\mu,\nu}$, i.e. $$ \mathcal L=\mathcal L(\{y_{\mu}\},\{y_{\mu,\nu}\}). $$ These variables are all independent.
When an actual field configuration is inserted into the Lagrangian, this insertion is given by the parametric relations $y_\mu=A_\mu(x)$ and $y_{\mu,\nu}=\partial_\nu A_\mu(x)$. The Lagrangian without composing it with an actual field configuration is $$ \mathcal L(\{y_\mu\},\{y_{\nu,\mu}\})=-\frac{1}{4}(y_{\nu,\mu}-y_{\mu,\nu})(y_{\lambda,\kappa}-y_{\kappa,\lambda})\eta^{\mu\kappa}\eta^{\nu\lambda}. $$
From this point on, it is straightforward to evaluate $$ \frac{\partial \mathcal L}{\partial y_{\beta,\alpha}}, $$ one only has to keep in mind that one should use different indices for differentiation than the indices that appear in the contractions in the Lagrangian, and that we have $$ \frac{\partial y_{\nu,\mu}}{\partial y_{\lambda,\kappa}}=\delta^\kappa_\mu\delta^\lambda_\nu. $$
I am noting that $\partial\mathcal L/\partial(\partial_\mu A_\nu)$ is an abuse of notation (an extremely common abuse of notation - so common that you won't find my notation pretty much anywhere except in texts about jet bundles - but an abuse of notation nontheless) for what I have denoted as $\partial\mathcal L/\partial y_{\nu,\mu}$.
How was $A_\mu$ chosen as the dynamical variable in the Euler-Lagrange equations?
The dynamical variable is the one you are solving the differential equation for. In terms of electromagnetic theory, the dynamical variables are either $A_\mu$ or $F_{\mu\nu}$. Purely classically, there is no irrefutable reason to prefer $A_\mu$ over $F_{\mu\nu}$. If one wishes to give a variational characterization of the equations, then $F_{\mu\nu}$ is not good. It would be too long to go into a detailed discussion why $F_{\mu\nu}$ is not good, so let three short notes suffice.
1) Maxwell's equations in terms of $F_{\mu\nu}$ is first order. It is difficult (but not impossible - see the Dirac equation) to give a first-order equation variational treatment, however those are pretty much always heavily constrained systems.
2) There are always as many Euler-Lagrange equations as many dynamical variables there are. There are 6 components of $F_{\mu\nu}$ and 8 Maxwell's equations. Once again this is not necessarily a problem since constraints have a tendency to imply certain Noether identities that make not all EL equations independent, however it is really not clear how to proceed in the case of $F_{\mu\nu}$. By contrast, the $A$-field has 4 components, and if we consider the Maxwell equations as equations for the $A$-field, then there are also 4 Maxwell equations.
3) Charged particles couple to the $A$-field, not the $F$-field, thus giving a unified variational treatment of a particle-field system would be very very very awkward if the $F$-field was the dynamical variable.
So, we take the $A$-field as the dynamical variable and our choice is a good one, since that one does admit a simple variational formalism.
This also kind of answers the 3rd question. The form of the Lagrangian has no bearing on the dynamical variable.