The search for a quantum mechanical theory could be done in a mathematical systematic fashion, and it starts from observables. So the answer to the OP last question is that the process is inverted: the relevant observables are given first (and as we will discuss below, they are justified by observations); then you find the space of states where these observables are measured as a consequence. This leads to the investigation of very general mathematical structures, i.e. noncommutative probability and especially the theory of operator algebras. These structures have a huge importance in pure mathematics, and their study led to powerful results that, when applied to quantum physics (i.e. when using quantum physical systems as a model for these structures) led to a much deeper understanding of how quantum mechanical systems behave, and to the justification/prediction of many important experimental facts. It is, in my opinion, quite unfair to diminish such contributions to knowledge (the ultimate goal of physics, rather than just the mere observation) as something that in the end just works and provide numbers that agree with experiments.$^{\dagger}$
In the following I will describe how such a search for a quantum theory works (without too much details).
It starts with observation. You have a physical quantum system, on which you are able to perform measurements that could describe with sufficient accuracy some physical quantities. Since the system is quantum mechanical, you will experience indeterminacy problems with you measurements, and in general you would be able only to determine the average value that an observable takes in a given state of the system.
Then you identify a set of observables for the system. This set $A$ should be as complete as possible, in order to cover all the effective or theoretically allowed measurements. For example,
if the system consists of a particle, you may think of measuring its (average) position, and in many cases experimentally perform such a measure. Operationally, you may think of a device that alerts you if the particle is in a given region $\Omega\subseteq \mathbb{R}^3$ of space. We may call this observable $\mathbb{1}_{\Omega}(x)$, and therefore $A\supset \{\mathbb{1}_{\Omega}(x),\Omega\subseteq \mathbb{R}^3\}$. The same thing may be done with a device that alerts you if the particle has a given momentum, again contained in $\Omega\subseteq \mathbb{R}^3$; and therefore $A\supset \{\mathbb{1}_{\Omega}(x), \mathbb{1}_{\Omega}(p),\Omega\subseteq \mathbb{R}^3\}$. So it is plausible that your set of observables has an infinite number of objects (e.g. to take into account that, in principle, you could measure if the particle is in any region of space); it is also plausible that your observables may be rescaled, summed, and multiplied (the last operation may also be non-commutative, as observations suggest). Finally, it is useful to extend the set of observables to "complex" ones (that is a mathematical convenience), and therefore allow multiplication by complex numbers and to have a "complex conjugation" of observables. Everyone of these operations has to satisfy suitable straightforward properties.
This set of observables with the properties above form a mathematical structure. This structure is called a $*$-algebra. The last notion that is useful to introduce, is that of the "magnitude" of an observable. This would be the supremum value that an observable can reach when measured on states. Mathematically, the magnitude has the form of a norm $\lVert\cdot\rVert: A\to \mathbb{R}_+$, with given properties. Now completing the $*$-algebra $A$ with respect to the norm $\lVert\cdot\rVert$ we obtain a Banach $*$-algebra $\mathfrak{A}$. This is the object we choose to be the set of observables of the system, provided it satisfies some additional assumptions on the norm (I do not want to give too much details); and it is called a $C^*$-algebra. At this stage, we have given mostly natural assumptions, that agree with experimental evidence (e.g. manipulation of observables, the concept of magnitude of an observable, the properties of magnitude itself...).
Given the set of observables of the system, i.e. a $C^*$-algebra $\mathfrak{A}$, we identify the space of states. The states should be described mathematically as objects that characterize the evaluation of observables, and therefore the behavior of the system. In other words, given any observable $a\in\mathfrak{A}$, we need a state $\omega$ that maps $a$ to a (eventually complex, but for "real" observables it would be real) number, to be interpreted as the average value of the observable in the state. This means that $\omega$ is a functional of $\mathfrak{A}$; and since it is suitable that it behaves in a continuous manner, it would be an element of the topological dual $\mathfrak{A}'$ of $\mathfrak{A}$. In addition, the state has to characterize evaluations that are correct in a probabilistic sense, therefore its norm in the dual space has to be one (i.e., absolute certainty has probability one). Finally, it has to preserve positivity: if we have an observable that has only positive admissible values, the evaluation must always be positive. Therefore we denote the space of states as
$$S_{\mathfrak{A}}=\{\omega\in\mathfrak{A}', \lVert\omega\rVert_{\mathfrak{A}'}=1, \forall (\mathfrak{A}\ni a\geq 0),\omega(a)\geq 0\}\; .$$
First mathematical result: to the algebra of observables is always associated an algebra of bounded operators in a Hilbert space. This result, called GNS construction, says that any given $C^*$-algebra $\mathfrak{A}$ is isomorphic to an algebra of bounded operators in a Hilbert space $\mathscr{H}$. In mathematical terms, we always have an isomorphic isometric map $\pi:\mathfrak{A}\to \mathscr{L}(\mathscr{H})$, such that (suitable) states are associated to "kets", i.e. to vectors $\Omega$ of the Hilbert space; and the corresponding evaluation takes the form $\omega(a)=\langle\Omega, \pi(a)\Omega\rangle$.
A second mathematical result: any irreducible representation of the algebra of the canonical commutation relations of QM$^\ddagger$ (in their exponentiated form) is unitarily equivalent to the one where the space is $L^2(\mathbb{R}^3)$, $x$ is the position operator, and $-i\hslash\nabla$ is the momentum operator. This is the so-called Stone-von Neumann theorem, and provides an answer to the OP point 2.
$\dagger$: This long preamble is partly a response to anna v and dmckee.
$\ddagger$: The algebra of the CCR of QM is based on a finite dimensional symplectic space (the simplest example being the classical phase space of a free particle, roughly speaking $\mathbb{R}^3\times\mathbb{R}^3$); if the symplectic space is infinite dimensional, the result does not hold anymore.