I only managed to sort these things in my head after Halmos and Greub
Lets talk about abstract vectors first. Lets define the contra-variant vectors as the 'usual' vectors:
$$
\mathbf{V}=V^i\,\mathbf{e}_i\in \mathcal{V}
$$
Where $\mathbf{V}$ is a vector, $V^i$ are the components, and $\{\mathbf{e}_i\}_{i=1\dots N}$ is the basis for this N-dimensional vector space $\mathcal{V}$ (only dealing with finite-dimentional vector spaces here).
Once you have this structure, you will find that we rarely can apply it to real world directly, the reason is that we can normally measure scalars, not vectors. Measuring several scalars can result in a vector, but this is not a single-step process.
So what we need are ways of reducing vectors to scalars (real-valued for now). These we call functionals:
$$
\mathbf{u}:\mathcal{V}\to\mathbb{R}
$$
A special class amongst these functionals are the linear homogeneous functionals. Lets call the space of such functionals $\mathcal{V}^*$. So if $\mathbf{u}\in \mathcal{V}^*$, and $\mathbf{v},\,\mathbf{w}\in \mathcal{V}$ and $\alpha,\beta\in \mathbb{R}$
$$
\begin{align}
\mathbf{u}:\mathcal{V}\to\mathbb{R}\\
\mathbf{u}\left(\alpha\mathbf{v}+\beta\mathbf{w}\right)=\alpha\cdot\mathbf{u}\left(\mathbf{v}\right)+\beta\cdot\mathbf{u}\left(\mathbf{w}\right) \\
\mathbf{u}\left(\mathbf{0}\right)=0
\end{align}
$$
An example of such functional would be $\mathbf{u}$ that simply returns the 'x-component' of any vector given to it.
We can then ask how can we systematically investigate the possible members of $\mathcal{V}^*$. We will then find that the only thing that matters is what numbers are assigned to the basis vectors of $\mathcal{V}$. Basically we define a following set of functionals:
$$
\mathbf{q}^j\left(\mathbf{e}_i\right)=\begin{cases}
1,\quad i=j\\
0,\quad otherwise
\end{cases}=\delta^j_i
$$
And then any $\mathbf{u}\in\mathcal{V}^*$ can be expressed as:
$$
\mathbf{u}=u_i\mathbf{q}^i
$$
So that for any $\mathbf{v}=v^i\mathbf{e}_i$ in $\mathcal{V}$:
$$
\mathbf{u}\left(\mathbf{v}\right)=u_j \mathbf{q}^j\left(v^i\mathbf{e}_i\right)=u_jv^i \mathbf{q}^j\left(\mathbf{e}_i\right)=u_iv^i
$$
Essentially $\mathcal{V}^*$ is itself a vector space, with basis $\{\mathbf{q}^j\}_{j=1\dots N}$, which is induced by $\mathcal{V}$. This we call the dual space. That's your co-variant vectors.
So that's why co-variant and contra-variant vectors are different, the former are linear functionals of latter (and vice verse)
A non-GR example where distinction between co- and contra-variant vectors becomes important is crystallography. One usually aligns the basis vectors with crystalline axis, and co-variant vectors are then in the reciprocal space.
We can often pretend that dual space is the same as the original vector space because the two are isomorphic (for finite vector spaces), that's where the confusion comes from.
Question 1: No, vectors can belong to contra-variant vector space, if it belongs to co-variant vector space it is a linear functional. Having said that, things like direct sums and tensor products can be used to build new vector spaces: $\mathcal{V}\oplus\mathcal{V}^*$ and $\mathcal{V}\otimes\mathcal{V}^*$ - there things get complicated.
Question 3: Yes. This is in the definition of a vector space. Any linear combination of vectors in the space belongs to the space
Finally you talk about an object $g_{ij}$. Given suitable properties, it establishes a map $\mathcal{V}\to\mathcal{V}^*$. Whilst I have never worked with vector spaces where such map cannot be defined, I see no reason for it always to be present, and no reason for it to be unique. So treat $g_{ij}$ as an add-on. Hence no contradiction. $\mathcal{V}$ and $\mathcal{V}^*$ are isomorphic but different for finite vector-spaces, hence you can define an isomprphism $g:\mathcal{V}\to\mathcal{V}^*$ between them.
PS: When it comes to manifolds, one constructs a vector space at each point of the manifold out of partial derivatives, so vectors are defined as $V=v^i\partial_i$. This is the tangent vector space. The dual space to that is the space of differential forms, again defined at each point of the manifold.