The identity you've used,
$$\hat{a}^\dagger \longrightarrow \frac{1}{\sqrt{2}}(\hat{b}^\dagger + i \hat{c}^\dagger) \hat{a}^\dagger
\qquad \text{(wrong!)}
$$
is incorrect. Instead, what you need to do is this:
$$
\hat{a}^\dagger = \frac{1}{\sqrt{2}}(\hat{b}^\dagger + i \hat{c}^\dagger)
$$
That is, the operator $\hat{a}^\dagger$ is exactly equal to the linear combination of operators $\frac{1}{\sqrt{2}}(\hat{b}^\dagger + i \hat{c}^\dagger)$, or in other words the operation of creating a photon on channel $a$ is exactly identical to the superposition of creating photons on the $b$ and $c$ channels.
If you want to keep going and add the second beam splitter, you do the same but with the relevant combination of modes:
$$
\hat{b}^\dagger = \frac{1}{\sqrt{2}}(\hat{d}^\dagger + i \hat{e}^\dagger).
$$
Similarly, you should have a relationship between $\hat{c}^\dagger$, $\hat{d}^\dagger$ and $\hat{e}^\dagger$, where you should ensure that the matrix that connects the two sets is unitary.
More importantly, you need a second such relationship coming from the beam splitter, which relates the $\hat{b}^\dagger$ and $\hat{c}^\dagger$ operators with the empty port of the first beam splitter:

The first time you see this, it feels extremely weird that this empty port is so important, but it is a crucial ingredient. In short, the port looks empty to you, but it still contains vacuum fluctuations, and those vacuum fluctuations can have a strong effect on the experiments downstream from the beamsplitter.
Once you put all of this together, you will be able to construct explicit relationships between the input modes ($\hat{a}^\dagger$ and $\hat{f}^\dagger$) and the output modes ($\hat{d}^\dagger$ and $\hat{e}^\dagger$) of the interferometer, so you can
- change input states like $\hat{a}^\dagger|0\rangle$ into output states, by simply substituting $\hat{a}^\dagger$ for the relevant combination of $\hat{d}^\dagger$ and $\hat{e}^\dagger$, or
- Express your observable (which is likely to be some combination of the number operators $\hat{d}^\dagger\hat{d}$ and $\hat{e}^\dagger\hat{e}$, since you're detecting intensity at the screens) in terms of your input operators $\hat{a}^\dagger$ and $\hat{f}^\dagger$, which can then be used to take expectation values for different input states, and so on.
Hopefully that clarifies how the formalism is meant to be used.