Linearity is the property that something distributes over addition, so in general it looks like $f(x + y) = f(x) + f(y).$ In the specific case of quantum mechanics we mean that quantum observables (including the total-energy observable which generates the Schrödinger equation) distribute over sums of wavefunctions.[Note 1]
So it helps to take a particular example, and in this particular example we'll talk about qubits, which can be in a state $|0\rangle$ or a state $|1\rangle$ or any superposition between them.
Suppose I want to know if there is a superposition between them. For instance, I might want to know about whether the system is in the states, $$|{+}\rangle = \sqrt{\frac12} |0\rangle + \sqrt{\frac12} |1\rangle,\\
|{-}\rangle = \sqrt{\frac12} |0\rangle - \sqrt{\frac12} |1\rangle.
$$
Naively, I want my observable to be 1 in the state $|+\rangle$, 1 in the state $|-\rangle$, and 0 in the state $|0\rangle$, and 0 in the state $|1\rangle$. So I might try something like, say,
$$
\hat A = |{+}\rangle\langle{+}| ~+~ |{-}\rangle\langle{-}|
$$
This observable $\hat A$ does have the required property that the observable is 1 in the state $|+\rangle$ and also 1 in the state $|-\rangle$. However, this is because the observable is 1 in every possible state including $|0\rangle$ and $|1\rangle.$ It is the identity operator. In fact if you expand it in terms of those definitions you'll find,
$$
\hat A = \frac12 \Big( |0\rangle\langle 0| + |0\rangle\langle 1| + |1\rangle\langle 0| + |1\rangle\langle 1| \Big) + \frac12 \Big( |0\rangle\langle 0| - |0\rangle\langle 1| - |1\rangle\langle 0| + |1\rangle\langle 1| \Big)\\
\hat A = |0\rangle\langle 0| + |1\rangle\langle 1|.
$$So if you tried the next most obvious thing, to subtract off the 1's in the $|0\rangle$ and $|1\rangle$ case, you would end up with the zero operator $\hat A' = 0,$ that is zero in every possible state including $|{+}\rangle$ and $|{-}\rangle.$
What you are running into, is that these outcomes are not linearly independent. And in fact when you specify what the observable is on all of your basis states, in this case specifying that it maps $|0\rangle$ to zero (because that's not a superposition) and $|1\rangle$ to zero (because that's also not a superposition), that action on the basis states suffices to specify the action of this linear observable on all superpositions of the basis states -- the action is zero for all superpositions.[Note 2]
Notes
You might be tempted to argue that this is a sufficiency criterion not a necessary one (or vice versa depending on how you look at it), “why does it have to be all observables and not just the total energy function which works this way in Quantum Mechanics?” This is mistaken for the following basic reason: if you give me a physical way to observe some value, I can generally hook it up to some other system so that it affects the energy of that other system. So for example if you want position to be a nonlinear observable, I can use two things which have an attractive or repulsive potential energy, separated by some position, and I get a nonlinear total-energy function between them. The strength of your qubit might become, say, the basis for some magnetic field affecting some spin-1/2 system, which will have higher energy if it opposes that magnetic field than if it aligns with it. So in QM all observables can potentially make it into a total-energy observable, and this is why the linearity of the Schrodinger equation implies the linearity of all observables.
This argument gets slightly more subtle if we start allowing mixed states, and the average formula is now the linear $\langle A\rangle = \operatorname{Tr}(\rho \hat A)$ rather than the nonlinear $\langle \psi| \hat A |\psi\rangle$, and probably Scott is thinking in particular about some arguments related to this case where “for me to observe something, I have to have an experiment that has entangled with it, that has put us into the $|\psi\rangle = \sqrt{\frac12} |00\rangle + \sqrt{\frac12} |11\rangle$ entangled state, but from that state I trace out a diagonal $\rho$ for my own case and this cannot see the off-diagonal terms of $\hat A$...” something like that. But mixed states aren't “true” superpositions so while I know that the argument can get this subtle, I don't think it needs to be for a student's basic understanding.