3

In an article that I am currently reading (under the Lorentz Invariants sub-heading), it explains that, just as the distance between two points on a Cartesian plane are obviously invariant of the coordinate system, the “spacetime distance” is also invariant. While in Cartesian coordinates $$(x_1-x_2)^2+(y_1-y_2)^2 = (x_1'-x_2')^2+ (y_1'-y_2')^2,$$ the space time analog is $$c^2(t_1-t_2)^2-(x_1-x_2)^2 = c^2(t_1'-t_2')^2- (x_1'-x_2')^2 = s^2$$ where $s^2$ is the spacetime interval.

I am having difficulty in understanding this notion of a spacetime interval and the intuition/derivation for why it can be written in this way and is invariant under Lorentz transformation.

I am aware that similar questions have been asked on this platform but none of them have fully cleared things up for me so far. Any help in providing an intuition or understanding would be appreciated.

Glorfindel
  • 1,424
P0W8J6
  • 411
  • You might appreciate this motivation of the signature due to Bondi https://physics.stackexchange.com/a/508251/148184 – robphy Aug 28 '21 at 02:06

6 Answers6

6

You have two great answers, but you might find it interesting to know that it was once common for spacetime in SR to be described with an imaginary time axis. That allowed people to consider that it was a straightforward Cartesian arrangement, where the calculation of a length was through the usual Pythagorean method of taking the square root of the squares of the component displacements along the four orthogonal axes. The fact that the time axis was iT meant that when you squared the displacement along the time axis you automatically got minus T squared.

The idea of an imaginary time axis also made the Lorenz transformation look like straightforward rotations in a 4D space, so some people thought that would make SR easier to grasp if described in that way. However, it turns out that using an imaginary time axis only works straightforwardly for SR, and causes all kinds of complications in GR, so it dropped out of fashion.

Marco Ocram
  • 26,161
4

Although it is the accepted answer, I find it difficult to agree with Dale's perspective. I agree that if $s^2=0$ then it is intuitive why it must be invariant to translations, rotations and Lorenz transformations, and why any other $s^2$ is invariant under purely spatial transformations. However, I consider the last part, of "well ok then $s^2$ is probably invariant to Lorenz transformations" a bit of a leap of faith, more than adding intuition, because it didn't really add any new perspective why this is correct and this leap of faith may as well be wrong. Intuition is subjective, so this paragraph in my answer is purely my opinion and everyone can disagree with me. Intuition is weird, and I think I'll try to add the mathematical-geometrical point of view of special relativity, hoping it will add to your intuition. (and sorry for long post!)


First of all, there is quite a review of linear algebra that is needed (we'll assume all of the vector fields are real, no complex numbers allowed for now, this is complicated enough :) ). In linear algebrea, one learns about how the inner product (or the "dot product", $\vec{u}\cdot \vec{v}$ for column vectors) can be the bases of all of the geometrical quantities: lengths, angles, distances, etc...

For example, the "length of a vector" is given by $$|v|^2 = \vec{v}\cdot \vec{v} $$ the distance between two points is given by $$|\vec{u}-\vec{v}|^2 = \left(\vec{v}-\vec{u}\right)\cdot \left(\vec{v}-\vec{u}\right)$$ and the angle between two vectors is given by: $$\vec{u}\cdot \vec{v} = |u|\cdot |v| \cdot \cos \alpha \Rightarrow \cos \alpha = \frac{\vec{u}\cdot \vec{v}}{|u|\cdot |v|}$$

Now, one usually thinks in the standard basis about the dot product being: $$\vec{u}\cdot \vec{v} = u_1 v_1 + u_2 v_2 +u_3 v_3 = \begin{pmatrix} u_1 & u_2 & u_3 \end{pmatrix}\cdot \begin{pmatrix} v_1\\ v_2\\ v_3 \end{pmatrix} = u^t v$$ this is true when your vectors are represented in the standard bases, $\{\hat{e_1}= \hat{x} ,\hat{e_2}= \hat{y} \}$, where the vectors are with length 1 and perpendicular to each other. However, say you would like to work in the basis $\{\hat{e_1} = 3\hat{x} , \hat{e_2} = \hat{x}+2\hat{y}\}$, this won't hold any more:

$$ (2\hat{e_1}+\hat{e_2})\cdot (5\hat{e_1}+0\hat{e_2}) \neq 2\cdot 5 + 1\cdot 0 $$ This is because $\hat{e_1}\cdot \hat{e_1} \neq 1$ and $\hat{e_1}\cdot \hat{e_2} \neq 0 $. This fortunetrly can be fixed by inserting a matrix "$G$" which holds the information about the dot products (lengths/angles) between the basis vectors: $u\cdot v = u^t G v$. What I'm trying to say is that to "do geometry" with an arbitrary basis, you need the matrix $G$ which has that information encoded.

The second thing we need to review is how linear transformations change the geometry. As a general rule of thumb, linear transformations transform the basis vectors by squeezing/stretching them and changing their angles. I'm snipping a photo from 3blue1brown's GREAT video about linear transformations: enter image description here This image shows how lengths, angles and distances (geometry..) changes after a linear transformation. Mathematically, after a transformation $T$ the inner product between two vectors will be $$(Tu)\cdot (Tv) = (Tu)^t G (Tv) = u^t T^t G T v\neq u^t G v = u\cdot v$$ This means that the geometry of space will remain the same if and only if $T^t G T = G$. In the standard basis and geometry, where $G=I$, one can show that these are exactly rotations and mirroring! This is why rotating space doesn't change geometric quantities!


Now, why did I talk so much about general dot products and linear transformations? Because we physicists really love analyzing stuff by looking what doesn't change, if its energy, momentum, charge etc.. or distances and geometry when changing our point of view. Then come Lorenz transformations and ruin everything: they mix space and time, and make us work hard to figure out how every quantity changed. :(

Wait, maybe we're just looking at it wrong? When analyzing free falling of an object in mechanics 101, you know that you can choose the basis to be however you like: $\hat{y}$ can be facing up or down, but the distances traveled will be the same, just the coordinated flipped because of the transformations between point of views (there is no universal "up" direction, any rotation is valid and doesn't change the core physics).

Special relativity tries to say the same thing: there is no universal frame of reference, some "geometrical" quantities will be the same no matter from different "point of views". Lets consider the case that when working with events with coordinates $(ct,x,y,z)$ the geometry matrix for space-time is not $I$ (why would it be? who said time needs to behave just like space from a geometric point of view?) but rather $$G = \begin{pmatrix} -1 & 0 & 0 &0 \\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 \end{pmatrix} $$ If $T$ is a a rotation or mirroring matrix in regular space (meaning having no time component), it "sees" only the identity matrix part of G, meaning it will trivially follow the $T^t G T = G$ condition for preserving geometry (because we already accepted that for $G=I$ this works). The neat thing is the added value when using Lorenz transformations: apparently they also follow $T^t G T = G$ (you can calculate to check), meaning Lorenz transformations are just a form of "rotation" or "basis changing that doesn't change geometry". This is awesome, because every geometrical calculation doing in one frame of reference will be the same after transforming to another frame, just with the coordinates mixed up.

If you're willing to accept that Lorenz transformations preserve this kind of weird geometrical matrix, then you can see that the distance between two vectors: $$(\Delta s)^2 = |u-v|^2=(u-v)\cdot (u-v) = (u-v)^t G (u-v) = -(\Delta ct)^2 + (\Delta x)^2 + (\Delta y)^2 + (\Delta z)^2 $$ also stays the same after transforming. This also explains why we really call it a "space-time interval" and treat it as a measurement of "distance", it really is: just not in an Euclidian-Geometry way.

I hope this helped, and not made you confused even more than you already are - relativity IS really a hard concept to grasp and extremely nonintuitive. Keep asking great questions! :)

Ofek Gillon
  • 3,956
3

The intuition is not too difficult to build. For convenience, I will write $\Delta x$ instead of $x_1-x_2$.

First, we know from the first postulate that the speed of light is invariant. If we write $$\Delta x^2 + \Delta y^2 + \Delta z^2 = c^2 \Delta t^2$$ this is the equation of a sphere of radius $c \ \Delta t^2$. In other words, this is a flash of light moving at $c$ in all directions. We can rewrite this easily as $$0 = -c^2 \Delta t^2+\Delta x^2 + \Delta y^2 + \Delta z^2 = s^2$$ where in this case $s$ is fixed to 0.

Now, written this way $s^2=0$ is clearly invariant by the first postulate. It is invariant under spacetime translations and spacetime rotations (which includes boosts). And with $\Delta t=0$ we know from our experience with Cartesian coordinates that $s^2=a$ is invariant under spatial translations and rotations. So it is not a big intuitive leap to think that $s^2=a$ is invariant under spacetime translations and rotations.

So, let’s take a step back and think about what we know and can guess. We know by the Pythagorean theorem that $\Delta x^2+\Delta y^2+\Delta z^2$ gives us an invariant in space. We want to figure out what function of time $f(\Delta t)$ we need to add so that $s^2=f(\Delta t) +\Delta x^2+\Delta y^2+\Delta z^2$ is an invariant in spacetime. We know by the second postulate that $f(\Delta t)=-c^2 \Delta t^2$ is the right form for the special case of $s^2= -c^2 \Delta t^2+\Delta x^2+\Delta y^2+\Delta z^2 =0$, so it is intuitive to think that is probably the right form for all $s^2$, not just zero.

Dale
  • 99,825
  • I may be entirely wrong here but, in your closing statement, do you mean to say that S^2 is invariant in space for 0 and invariant in spacetime for all values, rather than vice-versa? – P0W8J6 Aug 27 '21 at 06:03
  • No, we know that it is invariant in space ($\Delta t=0$) for all values. Nothing changes that. We are just trying to figure out how to bring in time. I will re-word the last comment to make it more clear. – Dale Aug 27 '21 at 11:35
0

The general concept is that anything that starts at a time and place and ends at another time and place, will have the same $s$ no matter who watches it, no matter how they are moving. Maybe think of a rocket igniting and flying and blowing up. We might disagree about where and when started and about where and when exploded. But not about $s$ between the events. Maybe important that we don’t even need to agree about where it was ignited. Just that it was. That alone will mean we both have some $x_1$ and some of $t_1$. Or if not a rocket then an atom decays, etc.

The reason it’s true is because Lorentz comes from the metric being invariant. In other words, if {$A \implies B$, and $A$} (where “A” means “A is true”), then it is a challenge to answer a question that says, “I know that $B$ is true, so why does that mean $A$ is true?”

I would say the best thing to do now is accept it is true. Because special relativity is just Newtonian (where $\Delta s = \sqrt{\Delta x^2+ \Delta y^2 + \Delta z^2}$ is invariant for observers) with a new definition of the metric, specifically $\Delta s = \sqrt{c^2 \Delta t^2 + \Delta x^2+ \Delta y^2 + \Delta z^2}$.

And then if you want badly to have more intuition, I’d investigate why invariant metric implies Lorentz with a search. But I would put that off.

Al Brown
  • 3,370
0

I think something that can make it more intuitive is to think about spacetime as a euclidean space with an imaginary coordinate. So if we think about a 4d space with coordinates (x,y,z,w) then we can substitute in $w=ict$. So if

$$(x_1-x_2)^2+(y_1-y_2)^2+(z_1-z_2)^2+(w_1-w_2)^2=(x'_1-x'_2)^2+(y'_1-y'_2)^2+(z'_1-z'_2)^2+(w'_1-w'_2)^2$$

then

$$(x_1-x_2)^2+(y_1-y_2)^2+(z_1-z_2)^2+(ict_1-ict_2)^2=(x'_1-x'_2)^2+(y'_1-y'_2)^2+(z'_1-z'_2)^2+(ict'_1-ict'_2)^2$$

so

$$(x_1-x_2)^2+(y_1-y_2)^2+(z_1-z_2)^2+(ic)^2(t_1-t_2)^2=(x'_1-x'_2)^2+(y'_1-y'_2)^2+(z'_1-z'_2)^2+(ic)^2(t'_1-t'_2)^2$$

and if you remember $i^2=-1$ then you can see that

$$(x_1-x_2)^2+(y_1-y_2)^2+(z_1-z_2)^2-c^2(t_1-t_2)^2=(x'_1-x'_2)^2+(y'_1-y'_2)^2+(z'_1-z'_2)^2-c^2(t'_1-t'_2)^2$$

Multiplying both sides of the equation by negative 1 you get

$$-(x_1-x_2)^2-(y_1-y_2)^2-(z_1-z_2)^2+c^2(t_1-t_2)^2=-(x'_1-x'_2)^2-(y'_1-y'_2)^2-(z'_1-z'_2)^2+c^2(t'_1-t'_2)^2$$

then move some terms around to get

$$c^2(t_1-t_2)^2-(x_1-x_2)^2-(y_1-y_2)^2-(z_1-z_2)^2=c^2(t'_1-t'_2)^2-(x'_1-x'_2)^2-(y'_1-y'_2)^2-(z'_1-z'_2)^2$$

0

After a lot of searching, I've found an interpretation that makes sense to me, so I'll share it here (credit to sudgylacmoe on YouTube).

To start off, let's think about 2D rotations in a different way. It's very nice to think about rotations as keeping some vector magnitude fixed (so that I can characterise rotations as linear isometries). Let's suppose that all we know about 2D rotations is that they seem to transform vector components as follows: \begin{align} \begin{pmatrix} x' \\ y' \end{pmatrix} = \begin{pmatrix} \cos \theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}, \end{align} for some parameter $\theta$. This means that, for small positive $\theta$ and vectors in the first quadrant, the $x$-component tends to decrease and the $y$-component tends to increase.

Now, it's difficult to find anything that is kept constant in these equations without just guessing. But let's say that we have a ruler along the positive $y$-axis. In this case, we can simply define the length of a vector to be whatever the ruler measures it as, after it's been rotated to be parallel with the $y$-axis. Since the $x$-component tends to decrease and the $y$-component tends to increase (in the first quadrant), we get the sense that the length should get larger both when $x$ and $y$ are larger.

But we don't have to settle for intuition, we can just work it out! The original vector is $(x, y)^T$, and the final vector is $(0, \ell)^T$, where the length $\ell$ is what we want to find. We also need to find the value of $\theta$ which aligns the vector with the $y$-axis. In other words, we want to solve the following system of equations: \begin{align} \begin{pmatrix} 0 \\ \ell \end{pmatrix} = \begin{pmatrix} \cos \theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}. \end{align} Expanding out and using some basic trig, it's not too hard to show that $\ell = \sqrt{x^2 + y^2}$. So we have recovered our formula for the length in Euclidean space!

(This derivation is, of course, circular; any geometric construction of trig functions has a baked in inner product structure (otherwise angles wouldn't make sense), and hence a distance. But it does serve as a starting point for deriving the spacetime interval, since we are in a similar position to what we just described.)

Now, we move onto spacetime. For simplicity, I consider only $(1+1)$-spacetime (nothing is really lost), and timelike vectors.

In spacetime, we don't really know what magnitude is, but we'd like to have it. Moreover, all we know about Lorentz transformations is that they can be written as the following: \begin{align} \begin{pmatrix} t' \\ x' \end{pmatrix} = \begin{pmatrix} \gamma & -\gamma v \\ -\gamma v & \gamma \end{pmatrix} \begin{pmatrix} t \\ x \end{pmatrix}, \end{align} where $\gamma = 1/\sqrt{1 - v^2}$ is the Lorentz factor, and $v$ is some parameter. We see that, for a vector in the first quadrant, a small $v$ tends to decrease both $t$ and $x$. In English, the time taken for the event decreases, and so does its distance from the origin.

Now, how do we measure the length of a general vector $(t, x)^T$? Well, we could just do what we did before, right? Since the vector is timelike, there is a reference frame in which the vector is aligned with the $t$-axis, or equivalently, where the event it corresponds to happens at the origin. So, extending the analogy with rotations, we think about putting a ruler parallel to the $t$-axis, which is just another name for a stopwatch fixed in space. With this, we define the spacetime interval to be the length of this vector when Lorentz boosted to be along the $t$-axis; in other words, it is simply the proper time. How does it depend on the original components? Intuitively, it should increase if $t$ increases, because the proper time is proportional to the observed time. But if $x$ increases, you have to go faster to make the event happen in one place, which means time is dilated relative to you. As such, the proper time of the event is smaller; so the spacetime interval decreases with increasing $x$. That's why there's a plus in the temporal component, and a minus in the spatial components.

But once again, we don't need to settle for intuition, since we can just solve the equations for the Lorentz boost to find the interval $\Delta s$: \begin{align} \begin{pmatrix} \Delta s \\ 0 \end{pmatrix} = \begin{pmatrix} \gamma & -\gamma v \\ -\gamma v & \gamma \end{pmatrix} \begin{pmatrix} t \\ x \end{pmatrix}. \end{align} Once again, it is not too difficult to show that $\Delta s = \sqrt{t^2 - x^2}$. So if we think of Lorentz boosts as rotations, in that we can apply a Lorentz boost to align a vector with a convenient ruler, we find that the spacetime interval has the desired form!

Note: Spacelike intervals are very similar, they just involve the reference frame in which the event is simultaneous with an event at the origin, and the interval is instead the proper length. But this, I think, is already enough motivation to see where the spacetime interval comes from.

Baylee V
  • 357