This turns out to be a rather important paper in the field of planar optical technology, when one seeks to build a planar realisation of a general unitary matrix. Also, in quantum optics, it describes an algorithm to realise any $N\times N$ unitary operator using only symmetric beamsplitters and delays. Again, for greatest practicability and wieldiness, one would probably build general unitary operators using planar optical technology.
I have read the paper in detail and here is how it all works. The working actually hangs on a lemma which is not made very clear in the paper at all (a very common scenario in "letters" papers, where the authors are not given much space to write).
Lemma: Any $N\times N$ unitary matrix $\Gamma_N\in U(N)$ can be written:
$$\Gamma = e^{i\,\phi_0}\,\left(\begin{array}{c | c}\cos\theta\,e^{i\,\phi_1} & e^{i\,\phi_2}\,\sin\theta\;\hat{v}^\dagger \\
e^{-i\,\phi_2}\,\sin\theta\;\hat{v}&V_{N-1}\end{array}\right)\tag{1}$$
where the matrix has been partitioned vertically into 1 and $N-1$ rows and 1 and $N-1$ columns and:
- $\theta,\,\phi_0,\,\phi_1,\,\phi_2\in\mathbb{R}$;
- $\hat{v}$ is an $N-1\times 1$ column vector with unit magnitude, i.e. $\hat{v}^\dagger\,\hat{v}=1$;
- $V_{N-1}$ is a normal operator (i.e. $V_{N-1}\,V_{N-1}^\dagger = V_{N-1}^\dagger\,V_{N-1}$ and $V$ commutes with its Hermitian conjugate);
- $\hat{v}$ is an eigenvector of $V_{N-1}$ with eigenvalue $-\cos\theta\,e^{-i\,\phi_1}$ (i.e. the negative complex conjugate of the element at position $(1,\,1)$ in $\Gamma$).
- All the other $N-2$ eigenvalues (aside from the one in point 4.) of $V_{N-1}$ have unit magnitude.
There is a corollary to this lemma that is yields an alternative, albeit less efficient, but much more clearly explained, algorithm to that in the paper.
Corollary Any $N\times N$ unitary matrix $\Gamma_N\in U(N)$ can be written:
$$\Gamma_N = e^{i\,\phi_0}\,\left(\begin{array}{c | c}1 & 0 \\
0&\Gamma_{N-1}\end{array}\right)\,\left(\begin{array}{cccccc}\cos\theta\,e^{i\,\phi_1} & e^{i\,\phi_2}\,\sin\theta&0&0&0&\cdots \\
e^{-i\,\phi_2}\,\sin\theta&-e^{-i\,\phi_1}\,\cos\theta&0&0&0&\cdots\\
0&0&e^{i\,\phi_3}&0&0&\cdots\\0&0&0&e^{i\,\phi_4}&0&\cdots\\0&0&0&0&e^{i\,\phi_5}&\cdots\\\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\end{array}\right)\times\left(\begin{array}{c | c}1 & 0 \\
0&\Gamma_{N-1}^\dagger\end{array}\right)\tag{2}$$
where:
- $\theta,\,\phi_0,\,\phi_1,\,\phi_2\in\mathbb{R}$;
- $\Gamma_{N-1}\in U(N-1)$ is an $N-1\times N-1$ unitary matrix which diagonalises the normal operator $V_{N-1}$ in Equation (1) in the Lemma, so that:
$$V_{N-1} = \Gamma_{N-1}\,\mathrm{diag}\left(-\cos\theta\,e^{-i\,\phi_1},\,e^{i\,\phi_2},\,e^{i\,\phi_3},\,e^{i\,\phi_4},\,e^{i\,\phi_5},\,\cdots\right)\,\Gamma_{N-1}^\dagger\tag{3}$$
I shall give the proof of this lemma below. But for now, let's look at what the corollary says, as a signal flow graph. I have drawn it below: the general $N\times N$ unitary matrix is made up of a general $2\times 2$ unitary matrix $T_{1,\,2}(\theta,\,\phi_0,\,\phi_1,\,\phi_2)$ and two unitary $N-1\times N-1$ matrices (the one on the right being the inverse / Hermitian conjugate of the one on the left).

It should now be clear that there is an algorithm to build any $N\times N$ unitary matrix from $2\times 2$ unitary matrices and phase delays alone.
Algorithm: One iteration of this algorithm is now:
- Diagonalise the lower right $N-1\times N-1$ partition $V_{N-1}$ of the $N\times N$ matrix $\Gamma_N$ in equation (1); since, by the lemma, $V_{N-1}$ is normal, this yields the two unitary $N-1\times N-1$ matrices $\Gamma_{N-1},\,\Gamma_{N-1}^\dagger$;
- Having found $\Gamma_{N-1}$, we find the central matrix $T(\theta,\,\phi_0,\,\phi_1,\,\phi_2)$ and the phase delays $\phi_3,\,\phi_4,\,\phi_5,\,\cdots$ by the product:
$$\begin{array}{l}e^{i\,\phi_0}\,\left(\begin{array}{cccccc}\cos\theta\,e^{i\,\phi_1} & e^{i\,\phi_2}\,\sin\theta&0&0&0&\cdots \\
e^{-i\,\phi_2}\,\sin\theta&e^{-i\,\phi_1}\,\cos\theta&0&0&0&\cdots\\
0&0&e^{i\,\phi_3}&0&0&\cdots\\0&0&0&e^{i\,\phi_4}&0&\cdots\\0&0&0&0&e^{i\,\phi_5}&\cdots\\\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\end{array}\right)=\\\\\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\left(\begin{array}{c | c}1 & 0 \\
0&\Gamma_{N-1}^\dagger\end{array}\right)\,\Gamma_N\,\left(\begin{array}{c | c}1 & 0 \\
0&\Gamma_{N-1}\end{array}\right)\end{array}\tag{4}$$
We now have two unitary $N-1\times N-1$ unitary matrices, which we realise with the next iteration of the algorithm.
It now readily follows by induction that any $N\times N$ unitary matrix can be realised as a concatenation of phase delays and $2\times 2$ unitary matrices: we assume we can realise the $2\times 2$ matrix (e.g. by Mach-Zehnder interferometers or symmetric $2\times 2$ planar waveguide couplers, as shown in the paper); this is the induction basis. The induction step is the iteration of the algorithm above, which shows that if one can realise any $N-1\times N-1$ unitary matrix in the assumed way, then we can also realise the $N\times N$ unitary matrix in this way.
The above method is not the one in the paper, but it forms a clear proof of the existence of a realisation of a unitary matrix in the form assumed in the paper. However, the above algorithm is far less efficient than the one in the paper, because the number of iterations grows exponentially with $N$ (the number of $2\times 2$ splitters can be seen to be $2^{N-1}-1$), whereas the algorithm in the paper calls for only $\frac{N\,(N-1)}{2}$ splitters. This makes the above algorithm workable and comparable to that in the paper for $N=3,\,4$, but it begins to become much more unwieldy for $N\geq 5$. For $N=5$, the above algorithm calls for 15 splitters, whereas the one in the paper calls for only 10.
So let's now look at the paper itself.
The basic idea is the same as the first pass in finding the inverse of a matrix by elementary row operations o.k.a Gaussian elimination. We begin with the matrix $\Gamma_N^{-1}=\Gamma_N^\dagger$: since none of the elements in $\Gamma_N^{-1}$ can have a magnitude greater than 1 (all the row/column lengths must be unity), there is always a $2\times2$ unitary matrix that can add a scaled version of any row with nonzero first column to eliminate the first element of any other row- indeed you have a degree of freedom insofar that any matrix of the required form scaled by a phase factor works too. For concreteness,lets assume we simply choose a matrix, say, from $SU(2)$ to do our work, but in parctice you maight use the degree of freedom to simplify some designs. So we begin with the element in the bottom row, and use a $2\times 2$ unitary matrix of the form $T_{N-1,\,N}$ to eliminate the bottom row's first element, i.e. that at position $(N,\,1)$. Then we use a $T_{N-2,\,N-1}$ to eliminate the element at position $(N-1,\,1)$. And so on. We work our way up the first column fore-multiplying by matrices of the form $T_{r-1,\,r}$ until we have eliminated the whole first column aside from the element at position $(1,\,1)$.
But now look at Equation (1) in the lemma. Since the whole product is unitary, and we have eliminated the first column aside from the element at $(1,\,1)$, the only matrix allowed by the general form in Equation (1) in the lemma is of the form:
$$\left(\begin{array}{c | c}e^{i\,\xi} & 0 \\
0&V_{N-1}\end{array}\right)\tag{5}$$
where $V_{N-1}$ is unitary (i.e. $\cos\theta = 1;\,\sin\theta = 0$). So now we have reduced the problem to finding a general $N-1\times N-1$ unitary matrix, and so we can call the algorithm again to yield further reductions until we're left with a $2\times 2$ unitary matrix.
Proof of Lemma and Corollary
To prove the lemma, we take the $N\times N$ unitary matrix to be partitioned as $\Gamma_N = \left(\begin{array}{c | c}\gamma_{1\,1} & \gamma_{1\,2} \\\hline\gamma_{2\,1}&V_{N-1}\end{array}\right)$, where $\gamma_{1\,1}\in\mathbb{C}$, $V_{N-1}$ is an $N-1\times N-1$ matrix, $\gamma_{1\,2}$ a $1\times N-1$ row vector and $\gamma_{2\,1}$ a $N-1\times1$ column vector. Then, on writing the two matrix equations $\Gamma_N\,\Gamma_N^\dagger=\Gamma_N^\dagger\,\Gamma_N=\mathrm{id}_N$ we get the following equations:
$$
\begin{array}{lclcl}
V_N\,V_N^\dagger + \gamma_{2\,1}\,\gamma_{2\,1}^\dagger &=& V_N\,V_N^\dagger + \gamma_{1\,2}^\dagger\,\gamma_{1\,2}&=&\mathrm{id}_{N-1}\\
|\gamma_{1\,1}|^2 + \gamma_{2\,1}^\dagger\,\gamma_{2\,1} &=& |\gamma_{1\,1}|^2 + \gamma_{1\,2}\,\gamma_{1\,2}^\dagger &=& 1\\
V\,\gamma_{1\,2}^\dagger + \gamma_{1\,1}^*\,\gamma_{2\,1} &=& V^\dagger\,\gamma_{2\,1}+\gamma_{1\,1}\,\gamma_{1\,2}^\dagger&=&0
\end{array}
\tag{6}$$
\noindent By taking the Hermitian conjugate of the last equation in the set (6), we get $V_{N-1}\,V_{N-1}^\dagger = V_{N-1}^\dagger \,V_{N-1}$ and thus $V_{N-1}$ is a normal operator, thus unitarily diagonalisable with some unitary matrix $\Gamma_{N-1}$.
From the first equation in the set (6), we get $\gamma_{2\,1}\,\gamma_{2\,1}^\dagger=\gamma_{1\,2}^\dagger\,\gamma_{1\,2}$, an equation between two projector operators. If any elements of $\gamma_{2\,1}$ are nonzero, this implies the same element of $\gamma_{1\,2}^\dagger$ is nonzero and that $\gamma_{1\,2}^\dagger = \alpha\,\gamma_{2\,1}$, for some $\alpha\in\mathbb{C}$. Since $\gamma_{2\,1}\,\gamma_{2\,1}^\dagger=\gamma_{1\,2}^\dagger\,\gamma_{1\,2}$, this implies $\alpha|=1$ so we find $\gamma_{1\,2} = e^{2\,i\,\phi_1}\,\gamma_{2\,1}$, for some $\phi_1\in\mathbb{R}$. From the second equation in the set (6) we then have $|\gamma_{1\,1}|^2 + \left|\gamma_{2\,1}\right|^2 = 1$, and, since $\gamma_{1\,2} = e^{2\,i\,\phi_1}\,\gamma_{2\,1}$, this imples the three conditions $\gamma_{1\,1}=\cos\theta\,e^{i\,\phi_0}$, $\gamma_{2\,1} = \sin\theta\,\hat{v}\,e^{-i\,\phi_1}\,\hat{v}$ and $\gamma_{1\,2} = \sin\theta\,\hat{v}\,e^{+i\,\phi_1}\,\hat{v}^\dagger$, for some unit length vector $\hat{v}\in\mathbb{C}^{N-1}$ and $\theta,\,\phi_0,\,\phi_1\in\mathbb{R}$, thus justifying three of the partitions in (1).
Now, since $\gamma_{1\,2} = e^{2\,i\,\phi_1}\,\gamma_{2\,1}$, the last equation of the set (6)} reduces to $V_{N-1}\,\sin\theta\,e^{-i\,\phi_1}\,\hat{v}=-\gamma_{1\,1}^*\,\sin\theta\,e^{-i\,\phi_1}\,\hat{v}$. That is, unless $\sin\theta=0$ (in which case $\Gamma_N$ is automatically of the form stated in (1)), we have that $V_{N-1}\,\hat{v}=-\gamma_{1\,1}^*\,\hat{v}$ and $\hat{v}$ is an eigenvector of $V_{N-1}$ with eigenvalue $-\gamma_{1\,1}^*$. This proves all the assertions in the lemma .
Now, to prove the corollary, we note that $V_{N-1}$ is a normal operator and has as eigenvector $\hat{v}$ with eigenvalue $-\gamma_{1\,1}^*$. Let $\Gamma_{N-1}$ be the unitary operator that diagonalises the normal $V_{N-1}$, and we choose the order of the mutually orthonormal eigenvectors of $V_{N-1}$, which are the columns of $\Gamma_{N-1}$, so that $\hat{v}$ is the first eigenvector. Then $\Gamma_{N-1}^\dagger\,\hat{v}=\left(\begin{array}{cccc}1&0&0&\cdots\end{array}\right)^T$ and so, from the first equation in (6) we find:
$$
\begin{array}{cl}
&\Gamma_{N-1}^\dagger\,V_{N-1}\,\Gamma_{N-1}\,\Gamma_{N-1}^\dagger\,V_{N-1}^\dagger\,\Gamma_{N-1}+(\sin\theta)^2 \mathrm(1,\,0,\,0,\,\cdots)\\ = &\mathrm{diag}(|\lambda_1|^2,\,|\lambda_2|^2,\,|\lambda_3|^2,\,\cdots)+(\sin\theta)^2 \mathrm{diag}(1,\,0,\,0,\,\cdots) =\mathrm{id}_{N-1}
\end{array}
\tag{7}$$
where the $\lambda_j$ are the eigenvalues of $V_{N-1}$. It follows from (7) that they are all of unit magnitude aside from the first, which is $\lambda_1 = -\gamma_{1\,1}^* = -\cos\theta\,e^{-i\,\phi_0}$, as has already been found.