"As far as I understood SR until now, this is actually the non-relativistic point of view. Bearing relativity in mind, the moving observer M' should experience both light beams from A and from B approaching with light speed as well, the beam from B being blue-shifted and the one from A red-shifted. In respect to M' they have different energies, but they should arrive at the same time."
In the paragraph you quote, he is analyzing things from the point of view of the rest frame of observer M, in which the flashes were simultaneous (this is just a physical assumption about the scenario being considered, of course you could also imagine a different physical scenario in which two flashes happened that weren't simultaneous from the perspective of an observer on the ground, but in this particular scenario they were). Based on this, he deduces that the signals will reach M' at different moments according to the clock of M' (note that predicting when light will strike a physical clock according to that clock's own reading does not require you to analyze things from the rest frame of the clock, you can use the time dilation equation to predict this using a frame where the clock is in motion). A key point to understand here is that all frames must agree in their predictions about local events, like what times two clocks read at the moment they pass right next to one another, or what time a given clock reads when the light signal from a distant event reaches it. If this wasn't the case, then different reference frames would be more like parallel universes that would predict totally different events. For example, say the clock of M' was equipped with light sensors connected to a bomb, and that the bomb was programmed to explode if the sensors received strong light within some very short time interval according to the clock. If different frames didn't agree about local events, they could in this case disagree about whether the bomb exploded, and about whether the observer standing next to it was alive or dead! Different reference frames are just intended to be different ways of assigning position and time coordinates to the same set of events, nothing more.
So, with that in mind, we conclude that all frames must agree the light from the flashes reached M' at different moments, including the rest frame of M'. But how can this be compatible with the fact that, according to the fundamental postulates of special relativity, both flashes must also travel at the same speed c in the rest frame of M' (something that would not be true from a non-relativistic point of view, assuming they traveled at the same speed in the frame of M), and both happened at next to ends of a train that are equidistant from M' in this frame? There is only one possible way of making this consistent--in the rest frame of M', the flashes must have happened at different times (i.e. they are assigned different time-coordinates in this frame). There is nothing problematic about the idea that photons which were emitted at different times will reach an observer at different times, even if both signals were emitted at the same distance from the observer, and both traveled at the same speed relative to the observer.