Your confusion stems basically from comparing results in different conventions.
Basically, there is always a phase difference of $\pi$ between the two output ports of a beam splitter, but this can only ever be 'morally' true, because that statement talks about the phase of the EM field at different points, and thus the phase will be different depending on where exactly you fix your measuring point on the two input and the two output ports. In situations where you have two waves co-propagating, then their relative phase is perfectly well-defined, but for the ports of the beam splitter, you're comparing phases at different beams in different positions and moving in different directions, so the whole thing is impossible without some artificial way to fix the convention.
Broadly speaking, there are two separate ways to fix the convention: one with an explicit $\pi$ phase shift,
$$
\begin{pmatrix} a_{\mathrm{out},1} \\ a_{\mathrm{out},2} \end{pmatrix}
=
\frac{1}{\sqrt{2}}
\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}
\begin{pmatrix} a_{\mathrm{in},1} \\ a_{\mathrm{in},2} \end{pmatrix},
$$
and one with several explicit $\pi/2$ phases:
$$
\begin{pmatrix} a_{\mathrm{out},1} \\ a_{\mathrm{out},2} \end{pmatrix}
=
\frac{1}{\sqrt{2}}
\begin{pmatrix} 1 & i \\ i & 1 \end{pmatrix}
\begin{pmatrix} a_{\mathrm{in},1} \\ a_{\mathrm{in},2} \end{pmatrix}.
$$
Those two conventions are exactly equivalent, since they can be transformed by adding a $\pi/2$ phase to $a_{\mathrm{in},2}$ and $a_{\mathrm{out},2}$,
$$
\frac{1}{\sqrt{2}}
\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}
=
\frac{1}{\sqrt{2}}
\begin{pmatrix} 1 & 0 \\ 0 & -i \end{pmatrix}
\begin{pmatrix} 1 & i \\ i & 1 \end{pmatrix}
\begin{pmatrix} 1 & 0 \\ 0 & -i \end{pmatrix},
$$
and experimentally this is equivalent to adding a thin slab of glass on one input and one output port. More importantly, though, you didn't know the precise value of the optical path before the $(\mathrm{in}{,}2)$ port or after the $(\mathrm{out}{,}2)$ output, so quibbling about these phases is completely moot.
What you do need is for the matrix governing the coupling to be unitary, which comes from a strict requirement of energy conservation.
Now, this requirement of unitarity can indeed sound a bit intimidating and exotic, but it is important to note that the requirement of unitarity has nothing to do with quantum mechanics, and it is already present within the hamiltonian classical-mechanics description of the system.
More precisely, when we say that the beam splitter can be described by a matrix, we are making two core assertions about the electromagnetic fields we're considering:
First, we are asserting that the EM fields we are willing to consider need to be linear combinations of only two pre-specified modes, which basically look like this:
$\qquad$ 
Second, we are recognizing that those fields can also be expressed as linear combinations of two modes which look like this,
$\qquad$
which are characterized by having all of the output energy on only one of the output ports.
Both of these sets of modes are bases for the same linear subspace of field modes, which means that each set can be expressed as a linear combination of the other set; in other words, that means that the amplitudes of each set of modes are related via some matrix.
More importantly, when we sit down to describe the (classical) field, we write either
$$
\mathbf E(\mathbf r,t) = \mathrm{Re}\mathopen{}\left[
\sum_{j=1}^2 \alpha_{\mathrm{in},j}(t) \mathbf E_{\mathrm{in},j}(\mathbf r)
\right]
$$
or
$$
\mathbf E(\mathbf r,t) = \mathrm{Re}\mathopen{}\left[
\sum_{j=1}^2 \alpha_{\mathrm{out},j}(t) \mathbf E_{\mathrm{out},j}(\mathbf r)
\right],
$$
where $\alpha_{\mathrm{in},j}(t)$ and $\alpha_{\mathrm{in},j}(t)$ are the complex-valued canonical variables that describe the dynamics of those modes, and which satisfy the dynamical equations
$$
\frac{\mathrm d^2}{\mathrm dt^2}\alpha_{x,j}(t) = -\omega^2 \alpha_{x,j}(t).
$$
The tricky part is the normalization: because $\alpha_{x,j}(t)$ and $\mathbf E_{x,j}(\mathbf r)$ only ever appear (thus far) in the product $\alpha_{x,j}(t) \mathbf E_{x,j}(\mathbf r)$, there is an ambiguity in a common complex factor that can be put on either side, including both normalization and phase.
- For the normalization, there is an objective absolute standard that needs to be adhered to: namely, that for each of the modes, the total energy flow across the beam-splitter midline needs to be constant. This is the only way to correctly quantize the system.
For the phase, there is no objective or absolute standard - there is a complete phase ambiguity on all four of the $\mathbf E_{x,j}(\mathbf r)$, and correspondingly on the $\alpha_{x,j}(t)$. You're free to pick any phase that you find convenient, but you do need to choose one.
And, moreover, the phase conventions that you choose for the $\mathrm{in}$ ports cannot be used to set those for the $\mathrm{out}$ ports, or vice versa, because they refer to completely different modes evaluated at different places. They're completely independent quantities.
Once you fix the total incoming energy flow per mode at $R$ (in joules per second), then the total energy flow can be shown to be both
$$
F = R\sum_{j=1}^2 |\alpha_{\mathrm{in},j}(t)|^2
$$
and
$$
F = R\sum_{j=1}^2 |\alpha_{\mathrm{out},j}(t)|^2,
$$
and those two energy flows need to match, because of energy conservation. This means, therefore, that the expression of the output canonical variables as linear combinations of the input canonical variables,
$$
\begin{pmatrix} \alpha_{\mathrm{out},1} \\ \alpha_{\mathrm{out},2} \end{pmatrix}
=
\begin{pmatrix} c & d \\ e & f \end{pmatrix}
\begin{pmatrix} \alpha_{\mathrm{in},1} \\ \alpha_{\mathrm{in},2} \end{pmatrix}
=
M
\begin{pmatrix} \alpha_{\mathrm{in},1} \\ \alpha_{\mathrm{in},2} \end{pmatrix},
$$
needs to preserve the norm $\sum_{j=1}^2 |\alpha_{x,j}(t)|^2$. Or, in other words, the matrix of the transformation needs to be unitary.
(Why unitary and not just having unit norm on each row or each column? Because the beam splitter needs to conserve $\sum_{j=1}^2 |\alpha_{x,j}(t)|^2$ both for cases where the input is on only one of the input ports, but also for cases where there's light on both. If the matrix isn't unitary, there will be a superposition of input beams that will have a different output energy than the sum of the inputs.)
All of the above is crucial for a correct hamiltonian classical-field-mechanical description of the beam splitter. Once you've done it correctly (but only after you've done the classical mechanics correctly), you're ready to move on to the quantum mechanics of the system, which is now very easy: since you've done the classical mechanics correctly, all you need to do is to replace the hamiltonian canonical variables with annihilation operators,
$$
\alpha_{x,j}(t) \mapsto \hat{a}_{x,j}.
$$
and you're done.
And, since you had a strict requirement of unitarity on the matrix that interlinked the hamiltonian canonical variables between the input and output sets, you have an identical requirement of unitarity on the matrix that links the input and output annihilation (and therefore creation) operators.
What you don't have, because you didn't have it on the classical fields, is any additional restriction on what the phases must be, either of $\pi/2$ or $\pi$ or anything, because (again) those are just doomed attempts at comparing phases on different places, which cannot be done to any absolute or objective standard.