Linearity follows from:
Translation invariance (where we mean translation in space and time): the image of a vector $\vec{AB}$ joining events $A$ and $B$ under a Lorentz transformation $\mathscr{L}:\mathbb{R}^{1+3}\to\mathbb{R}^{1+3}$ is unaffected by the addition of any offset added to both ends $A$ and $B$;
Continuity: The Lorentz transformation $\mathscr{L}:\mathbb{R}^{1+3}\to\mathbb{R}^{1+3}$ is a continuous map.
To see how this plays out, our translation invariance axiom is encoded:
$$\mathscr{L}(X+Y)-\mathscr{L}(Y) = \mathscr{L}(X)-\mathscr{L}(0);\quad\forall\,X,\,Y\in\mathbb{R}^{1+3}\tag{1}$$
If we define $h:\mathbb{R}^{1+3}\to\mathbb{R}^{1+3}$ by $h(Z)=\mathscr{L}(Z)-\mathscr{L}(0)$ then it follows from (1) alone that:
$$h(X+Y)=h(X)+h(Y);\quad\forall\,X,\,Y\in\mathbb{R}^{1+3}\tag{2}$$
This is the famous Cauchy functional equation generalized to $3+1$ dimensions. For one, real dimension, the only continuous solution is $h(X)\propto X$; there are other solutions, but they are everywhere discontinuous, as shown in Section 1.5 of the Hewitt and Stromberg reference I give at the end. It is easy to broaden the Hewitt-Stromberg argument to any number of dimensions, so that, given our continuity postulate, we must have:
$$\mathscr{L}(X) = \Lambda\,X + \Delta\tag{3}$$
where $\Lambda$ is a linear operator - a $4\times4$ matrix and $\Delta$ a spacetime offset. Given again our translation invariance postulate, we can translate the image (3) to cancel the offset, whence we see that we can, without loss of generalness, take the Lorentz transformation to be linear and homogeneous:
$$\mathscr{L}(X) = \Lambda\,X\tag{4}$$
Reference
E. Hewitt & K. R. Stromberg, "Real and Abstract Analysis" (Graduate Texts in Mathematics), Springer-Verlag, Berlin, 1965. Chapter 1, section 5 constructs all solutions to the Cauchy equation $f:\mathbb{R}\to\mathbb{R};\,f(x+y)=f(x)+f(y)$. These are worth looking at, the discontinuous ones are truly weird and wonderful.