3

So I am trying to read this paper (no paywall here). In this the formal construction of a matrix $T$ is given as

The product of matrices is equivalent to setting up experimental devices in sequence. Finding an optical experiment belonging to an arbitrary unitary matrix is therefore completely equivalent to factorizing the uni- tary matrix into a product of block matrices containing only beam splitter matrices with appropriate phase shifts as defined in Eq. (1). We define a matrix $T_{pq}$ which is an $N$-dimensional identity matrix with the elements $|pp$, $|pq$, $|qp$, and $|qq$ replaced by the corresponding beam splitter matrix elements. This matrix performs a unitary transformation on a two-dimensional subspace of the $N$ dimensional Hilbert space leaving an $(N — 2)$-dimensional subspace unchanged [17]. It can be used to successively make all off-diagonal elements of the given $N \times N$ unitary matrix zero, a method similar to Gaussian elimination.

I just for the life of me cannot figure out how to make $T_{pq}$. I would be very thankful if someone can please explain it to me.

Nikos M.
  • 5,142
  • Paresh: I have looked into this paper in a great deal of detail. I have written my "review" of the paper up as a second answer if you're still interested in this problem. – Selene Routley Jan 25 '15 at 03:29

2 Answers2

3

Without seeing the full paper, I'm almost certain that the following is what is being done.

Let $\{\hat{X}_j\}_{j=1}^N$ be the chosen orthonormal basis vectors for our state space $\mathbb{C}^N$ with respect to which we wish to write the matrices representing our linear, unitary transformations.

Any homogeneous linear transformation $T:\mathbb{C}^N\to\mathbb{C}^N$ is defined wholly by the $N$ values $T(\hat{X}_j)$.

Now let's think about the authors' $T_{p,q}$. What they mean is $T_{p,q}(\hat{X}_j) = \hat{X}_j$ unless $j=p$ or $j=q$. In this special case, $T_{p,q}$ restricted to the vector space spanned by $\hat{X}_p,\,\hat{X}_q$ has an image in that same vector space, i.e. $T_{p,q}$ maps the space $\operatorname{span}(\{\hat{X}_p,\,\hat{X}_q\})$ to itself:

$$T_{p,q}:\operatorname{span}(\{\hat{X}_p,\,\hat{X}_q\})\to \operatorname{span}(\{\hat{X}_p,\,\hat{X}_q\})$$

and this restricted linear operator is wholly defined by

$$T_{p,q}(\hat{X}_p) = \ell_{p,p} \hat{X}_p+\ell_{q,p} \hat{X}_q$$ $$T_{p,q}(\hat{X}_q) = \ell_{p,q} \hat{X}_p+\ell_{q,q} \hat{X}_q$$

and, moreover, the $2\times 2$ matrix $\Lambda_{p,q}=\left(\begin{array}{cc}\ell_{p,p}&\ell_{q,p}\\\ell_{p,q}&\ell_{q,q}\end{array}\right)\in U(2)$ and is unitary. So the images of $\hat{X}_p$ and $\hat{X}_q$ define the matrix $T_{p,q}$. (The weird ordering of the indices in the off diagonal terms will become clearer below).

So, how do we write the matrix of $T_{p,q}$? Well, we write the image of $\hat{X}_j$ in the $j^{th}$ column. So all the columns of $T_{p,q}$ aside from the $p^{th}$ and $q^{th}$ column are simply what they would be in the identity matrix, since the image of $\hat{X}_j$ is $\hat{X}_j$. This leaves the last two columns, the columns $p$ and $q$.

For column $p$:

These are all noughts aside from the element at position $(p,p)$, which is the element $\ell_{p,p}$ in our restricted $2\times 2$ matrix $\Lambda_{p,q}$. Likewise, the element at position $(q,p)$ in the $p^{th}$ column is $\ell_{q,p}$ in our restricted $2\times 2$ matrix $\Lambda_{p,q}$.

For column $q$, the analogous thing happens. All noughts aside from $\ell_{p,q}$ and $\ell_{q,q}$

Now, I am not sure whether the authors allow the restricted $2\times 2$ matrix $\Lambda_{p,q}$ to be any unitary $2\times 2$ matrix. They mention beamsplitter matrices, which could be construed as any unitary $2\times2$ matrix, or they could have the matrices I describe in this answer here in mind. I would guess the former, i.e. any unitary matrix.

  • Are you sure everything will be noughts except (p,p),(p,q) etc? Basically what you are saying is, if I have a T42 which is 4x4 I would have noughts at everything except (4,2),(4,4),(2,4) and (2,2) – Paresh Mathur Nov 03 '14 at 09:50
  • @PareshMathur No, you would also have element (1,1) = 1, and element (3,3) = 1. Remember, this is the same as the identity matrix aside from columns 2 and 4. – Selene Routley Nov 03 '14 at 10:30
  • 1
    Thank you for this. I incorporated this into my programs last night and it indeed does work. – Paresh Mathur Nov 04 '14 at 03:47
  • @PareshMathur Fantastic! Are you a PhD student or postdoc? I always find it quite a buzz when my programming of something begins to work. BTW: Is it true that the $\Lambda_{p,q}$ is a general unitary $2\times 2$ matrix? I can prove that a product of the form stated by the authors can indeed realise any unitary $N\times N$ matrix if this is the case; I don't think it works if $\Lambda_{p,q}$ must be the matrix for a symmetric beamsplitter I referenced in the other answer. My proof is an existence proof: it's grand to see someone can implement it as a constructive algorithm. – Selene Routley Nov 04 '14 at 05:03
  • None of the above, I am a masters student. Its not a general unitary 2x2 matrix, it is a matrix of a beam splitter with variable phase and frequency (basically variable reflectance and transmittance). This matrix is realized using a Mach Zander Interferometer. – Paresh Mathur Nov 04 '14 at 11:25
3

This turns out to be a rather important paper in the field of planar optical technology, when one seeks to build a planar realisation of a general unitary matrix. Also, in quantum optics, it describes an algorithm to realise any $N\times N$ unitary operator using only symmetric beamsplitters and delays. Again, for greatest practicability and wieldiness, one would probably build general unitary operators using planar optical technology.

I have read the paper in detail and here is how it all works. The working actually hangs on a lemma which is not made very clear in the paper at all (a very common scenario in "letters" papers, where the authors are not given much space to write).


Lemma: Any $N\times N$ unitary matrix $\Gamma_N\in U(N)$ can be written:

$$\Gamma = e^{i\,\phi_0}\,\left(\begin{array}{c | c}\cos\theta\,e^{i\,\phi_1} & e^{i\,\phi_2}\,\sin\theta\;\hat{v}^\dagger \\ e^{-i\,\phi_2}\,\sin\theta\;\hat{v}&V_{N-1}\end{array}\right)\tag{1}$$

where the matrix has been partitioned vertically into 1 and $N-1$ rows and 1 and $N-1$ columns and:

  1. $\theta,\,\phi_0,\,\phi_1,\,\phi_2\in\mathbb{R}$;
  2. $\hat{v}$ is an $N-1\times 1$ column vector with unit magnitude, i.e. $\hat{v}^\dagger\,\hat{v}=1$;
  3. $V_{N-1}$ is a normal operator (i.e. $V_{N-1}\,V_{N-1}^\dagger = V_{N-1}^\dagger\,V_{N-1}$ and $V$ commutes with its Hermitian conjugate);
  4. $\hat{v}$ is an eigenvector of $V_{N-1}$ with eigenvalue $-\cos\theta\,e^{-i\,\phi_1}$ (i.e. the negative complex conjugate of the element at position $(1,\,1)$ in $\Gamma$).
  5. All the other $N-2$ eigenvalues (aside from the one in point 4.) of $V_{N-1}$ have unit magnitude.

There is a corollary to this lemma that is yields an alternative, albeit less efficient, but much more clearly explained, algorithm to that in the paper.


Corollary Any $N\times N$ unitary matrix $\Gamma_N\in U(N)$ can be written:

$$\Gamma_N = e^{i\,\phi_0}\,\left(\begin{array}{c | c}1 & 0 \\ 0&\Gamma_{N-1}\end{array}\right)\,\left(\begin{array}{cccccc}\cos\theta\,e^{i\,\phi_1} & e^{i\,\phi_2}\,\sin\theta&0&0&0&\cdots \\ e^{-i\,\phi_2}\,\sin\theta&-e^{-i\,\phi_1}\,\cos\theta&0&0&0&\cdots\\ 0&0&e^{i\,\phi_3}&0&0&\cdots\\0&0&0&e^{i\,\phi_4}&0&\cdots\\0&0&0&0&e^{i\,\phi_5}&\cdots\\\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\end{array}\right)\times\left(\begin{array}{c | c}1 & 0 \\ 0&\Gamma_{N-1}^\dagger\end{array}\right)\tag{2}$$

where:

  1. $\theta,\,\phi_0,\,\phi_1,\,\phi_2\in\mathbb{R}$;
  2. $\Gamma_{N-1}\in U(N-1)$ is an $N-1\times N-1$ unitary matrix which diagonalises the normal operator $V_{N-1}$ in Equation (1) in the Lemma, so that:

$$V_{N-1} = \Gamma_{N-1}\,\mathrm{diag}\left(-\cos\theta\,e^{-i\,\phi_1},\,e^{i\,\phi_2},\,e^{i\,\phi_3},\,e^{i\,\phi_4},\,e^{i\,\phi_5},\,\cdots\right)\,\Gamma_{N-1}^\dagger\tag{3}$$


I shall give the proof of this lemma below. But for now, let's look at what the corollary says, as a signal flow graph. I have drawn it below: the general $N\times N$ unitary matrix is made up of a general $2\times 2$ unitary matrix $T_{1,\,2}(\theta,\,\phi_0,\,\phi_1,\,\phi_2)$ and two unitary $N-1\times N-1$ matrices (the one on the right being the inverse / Hermitian conjugate of the one on the left).

Unitary Matrix Decomposition

It should now be clear that there is an algorithm to build any $N\times N$ unitary matrix from $2\times 2$ unitary matrices and phase delays alone.

Algorithm: One iteration of this algorithm is now:

  1. Diagonalise the lower right $N-1\times N-1$ partition $V_{N-1}$ of the $N\times N$ matrix $\Gamma_N$ in equation (1); since, by the lemma, $V_{N-1}$ is normal, this yields the two unitary $N-1\times N-1$ matrices $\Gamma_{N-1},\,\Gamma_{N-1}^\dagger$;
  2. Having found $\Gamma_{N-1}$, we find the central matrix $T(\theta,\,\phi_0,\,\phi_1,\,\phi_2)$ and the phase delays $\phi_3,\,\phi_4,\,\phi_5,\,\cdots$ by the product:

$$\begin{array}{l}e^{i\,\phi_0}\,\left(\begin{array}{cccccc}\cos\theta\,e^{i\,\phi_1} & e^{i\,\phi_2}\,\sin\theta&0&0&0&\cdots \\ e^{-i\,\phi_2}\,\sin\theta&e^{-i\,\phi_1}\,\cos\theta&0&0&0&\cdots\\ 0&0&e^{i\,\phi_3}&0&0&\cdots\\0&0&0&e^{i\,\phi_4}&0&\cdots\\0&0&0&0&e^{i\,\phi_5}&\cdots\\\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\end{array}\right)=\\\\\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\left(\begin{array}{c | c}1 & 0 \\ 0&\Gamma_{N-1}^\dagger\end{array}\right)\,\Gamma_N\,\left(\begin{array}{c | c}1 & 0 \\ 0&\Gamma_{N-1}\end{array}\right)\end{array}\tag{4}$$

We now have two unitary $N-1\times N-1$ unitary matrices, which we realise with the next iteration of the algorithm.

It now readily follows by induction that any $N\times N$ unitary matrix can be realised as a concatenation of phase delays and $2\times 2$ unitary matrices: we assume we can realise the $2\times 2$ matrix (e.g. by Mach-Zehnder interferometers or symmetric $2\times 2$ planar waveguide couplers, as shown in the paper); this is the induction basis. The induction step is the iteration of the algorithm above, which shows that if one can realise any $N-1\times N-1$ unitary matrix in the assumed way, then we can also realise the $N\times N$ unitary matrix in this way.


The above method is not the one in the paper, but it forms a clear proof of the existence of a realisation of a unitary matrix in the form assumed in the paper. However, the above algorithm is far less efficient than the one in the paper, because the number of iterations grows exponentially with $N$ (the number of $2\times 2$ splitters can be seen to be $2^{N-1}-1$), whereas the algorithm in the paper calls for only $\frac{N\,(N-1)}{2}$ splitters. This makes the above algorithm workable and comparable to that in the paper for $N=3,\,4$, but it begins to become much more unwieldy for $N\geq 5$. For $N=5$, the above algorithm calls for 15 splitters, whereas the one in the paper calls for only 10.

So let's now look at the paper itself.

The basic idea is the same as the first pass in finding the inverse of a matrix by elementary row operations o.k.a Gaussian elimination. We begin with the matrix $\Gamma_N^{-1}=\Gamma_N^\dagger$: since none of the elements in $\Gamma_N^{-1}$ can have a magnitude greater than 1 (all the row/column lengths must be unity), there is always a $2\times2$ unitary matrix that can add a scaled version of any row with nonzero first column to eliminate the first element of any other row- indeed you have a degree of freedom insofar that any matrix of the required form scaled by a phase factor works too. For concreteness,lets assume we simply choose a matrix, say, from $SU(2)$ to do our work, but in parctice you maight use the degree of freedom to simplify some designs. So we begin with the element in the bottom row, and use a $2\times 2$ unitary matrix of the form $T_{N-1,\,N}$ to eliminate the bottom row's first element, i.e. that at position $(N,\,1)$. Then we use a $T_{N-2,\,N-1}$ to eliminate the element at position $(N-1,\,1)$. And so on. We work our way up the first column fore-multiplying by matrices of the form $T_{r-1,\,r}$ until we have eliminated the whole first column aside from the element at position $(1,\,1)$.

But now look at Equation (1) in the lemma. Since the whole product is unitary, and we have eliminated the first column aside from the element at $(1,\,1)$, the only matrix allowed by the general form in Equation (1) in the lemma is of the form:

$$\left(\begin{array}{c | c}e^{i\,\xi} & 0 \\ 0&V_{N-1}\end{array}\right)\tag{5}$$

where $V_{N-1}$ is unitary (i.e. $\cos\theta = 1;\,\sin\theta = 0$). So now we have reduced the problem to finding a general $N-1\times N-1$ unitary matrix, and so we can call the algorithm again to yield further reductions until we're left with a $2\times 2$ unitary matrix.


Proof of Lemma and Corollary

To prove the lemma, we take the $N\times N$ unitary matrix to be partitioned as $\Gamma_N = \left(\begin{array}{c | c}\gamma_{1\,1} & \gamma_{1\,2} \\\hline\gamma_{2\,1}&V_{N-1}\end{array}\right)$, where $\gamma_{1\,1}\in\mathbb{C}$, $V_{N-1}$ is an $N-1\times N-1$ matrix, $\gamma_{1\,2}$ a $1\times N-1$ row vector and $\gamma_{2\,1}$ a $N-1\times1$ column vector. Then, on writing the two matrix equations $\Gamma_N\,\Gamma_N^\dagger=\Gamma_N^\dagger\,\Gamma_N=\mathrm{id}_N$ we get the following equations:

$$ \begin{array}{lclcl} V_N\,V_N^\dagger + \gamma_{2\,1}\,\gamma_{2\,1}^\dagger &=& V_N\,V_N^\dagger + \gamma_{1\,2}^\dagger\,\gamma_{1\,2}&=&\mathrm{id}_{N-1}\\ |\gamma_{1\,1}|^2 + \gamma_{2\,1}^\dagger\,\gamma_{2\,1} &=& |\gamma_{1\,1}|^2 + \gamma_{1\,2}\,\gamma_{1\,2}^\dagger &=& 1\\ V\,\gamma_{1\,2}^\dagger + \gamma_{1\,1}^*\,\gamma_{2\,1} &=& V^\dagger\,\gamma_{2\,1}+\gamma_{1\,1}\,\gamma_{1\,2}^\dagger&=&0 \end{array} \tag{6}$$

\noindent By taking the Hermitian conjugate of the last equation in the set (6), we get $V_{N-1}\,V_{N-1}^\dagger = V_{N-1}^\dagger \,V_{N-1}$ and thus $V_{N-1}$ is a normal operator, thus unitarily diagonalisable with some unitary matrix $\Gamma_{N-1}$.

From the first equation in the set (6), we get $\gamma_{2\,1}\,\gamma_{2\,1}^\dagger=\gamma_{1\,2}^\dagger\,\gamma_{1\,2}$, an equation between two projector operators. If any elements of $\gamma_{2\,1}$ are nonzero, this implies the same element of $\gamma_{1\,2}^\dagger$ is nonzero and that $\gamma_{1\,2}^\dagger = \alpha\,\gamma_{2\,1}$, for some $\alpha\in\mathbb{C}$. Since $\gamma_{2\,1}\,\gamma_{2\,1}^\dagger=\gamma_{1\,2}^\dagger\,\gamma_{1\,2}$, this implies $\alpha|=1$ so we find $\gamma_{1\,2} = e^{2\,i\,\phi_1}\,\gamma_{2\,1}$, for some $\phi_1\in\mathbb{R}$. From the second equation in the set (6) we then have $|\gamma_{1\,1}|^2 + \left|\gamma_{2\,1}\right|^2 = 1$, and, since $\gamma_{1\,2} = e^{2\,i\,\phi_1}\,\gamma_{2\,1}$, this imples the three conditions $\gamma_{1\,1}=\cos\theta\,e^{i\,\phi_0}$, $\gamma_{2\,1} = \sin\theta\,\hat{v}\,e^{-i\,\phi_1}\,\hat{v}$ and $\gamma_{1\,2} = \sin\theta\,\hat{v}\,e^{+i\,\phi_1}\,\hat{v}^\dagger$, for some unit length vector $\hat{v}\in\mathbb{C}^{N-1}$ and $\theta,\,\phi_0,\,\phi_1\in\mathbb{R}$, thus justifying three of the partitions in (1).

Now, since $\gamma_{1\,2} = e^{2\,i\,\phi_1}\,\gamma_{2\,1}$, the last equation of the set (6)} reduces to $V_{N-1}\,\sin\theta\,e^{-i\,\phi_1}\,\hat{v}=-\gamma_{1\,1}^*\,\sin\theta\,e^{-i\,\phi_1}\,\hat{v}$. That is, unless $\sin\theta=0$ (in which case $\Gamma_N$ is automatically of the form stated in (1)), we have that $V_{N-1}\,\hat{v}=-\gamma_{1\,1}^*\,\hat{v}$ and $\hat{v}$ is an eigenvector of $V_{N-1}$ with eigenvalue $-\gamma_{1\,1}^*$. This proves all the assertions in the lemma .

Now, to prove the corollary, we note that $V_{N-1}$ is a normal operator and has as eigenvector $\hat{v}$ with eigenvalue $-\gamma_{1\,1}^*$. Let $\Gamma_{N-1}$ be the unitary operator that diagonalises the normal $V_{N-1}$, and we choose the order of the mutually orthonormal eigenvectors of $V_{N-1}$, which are the columns of $\Gamma_{N-1}$, so that $\hat{v}$ is the first eigenvector. Then $\Gamma_{N-1}^\dagger\,\hat{v}=\left(\begin{array}{cccc}1&0&0&\cdots\end{array}\right)^T$ and so, from the first equation in (6) we find:

$$ \begin{array}{cl} &\Gamma_{N-1}^\dagger\,V_{N-1}\,\Gamma_{N-1}\,\Gamma_{N-1}^\dagger\,V_{N-1}^\dagger\,\Gamma_{N-1}+(\sin\theta)^2 \mathrm(1,\,0,\,0,\,\cdots)\\ = &\mathrm{diag}(|\lambda_1|^2,\,|\lambda_2|^2,\,|\lambda_3|^2,\,\cdots)+(\sin\theta)^2 \mathrm{diag}(1,\,0,\,0,\,\cdots) =\mathrm{id}_{N-1} \end{array} \tag{7}$$

where the $\lambda_j$ are the eigenvalues of $V_{N-1}$. It follows from (7) that they are all of unit magnitude aside from the first, which is $\lambda_1 = -\gamma_{1\,1}^* = -\cos\theta\,e^{-i\,\phi_0}$, as has already been found.