In many textbooks [e.g. Peskin & Schröder p. 30 eq. (2.55), or Tong's notes p. 41 eq. (2.101)], the retarded propagator is defined as $$G_R = \Theta(x^0-y^0) \left< [\phi(x), \phi(y)] \right> = \Theta(x^0-y^0) \Big( D(x-y) - D(y-x) \Big). \tag{1}$$ In contrast, other sources (see e.g., this answer and references therein), define the retarded propagator as $$G_R = \Theta(x^0-y^0) \left< \phi(x) \phi(y) \right> = \Theta(x^0-y^0) D(x-y) \tag{2}$$ What motivates these two clearly different definitions and, in particular, the much more complicated definition in Eq. 1?
The propagator given in Eq. 2 makes perfect sense. It's the probability amplitude to find the particle at $x=(t_x,\vec x)$ if it starts at $y=(t_y,\vec y)$ and is only nonzero if $t_x>t_y$. So we only take into account how a particle propagates to a different location at a later moment in time.
The propagator in Eq. 1 is stranger. It also contains the amplitude described above. But then we add to this amplitude the amplitude that the particle was at $y=(t_y,\vec y)$ if is now at $x=(t_x,\vec x)$. (By using the Heaviside function we make sure that $\phi(x)$ generates a state at an earlier moment in time. Hence the second term in Eq. 1 $\propto D(y-x) $ is the amplitude that the particle was at $y$ if it is now at $x$.)
So the propagator in Eq. 2 is something we can immediately understand why the propagator in Eq. 1 is quite unintuitive. Why does it make sense to consider the propagator in Eq. 1 and what's the physical difference between the two?