Let's start at the beginning:
The setting for relativity - be it special or general - is that spacetime is a manifold $\mathcal{M}$, i.e. something that is locally homeomorphic to Cartesian space $\mathbb{R}^n$ ($n = 4$ in the case of relativity), but not globally.
Such manifolds possess a tangent space $T_p\mathcal{M}$ at every point, which is where the vectors one usually talks about live. If you choose coordinates $x^i$ on the manifold, then the space of tangent vectors is
$$T_p\mathcal{M} := \{\sum_{i=0}^3 c^i \frac{\partial}{\partial x^i} \lvert c^i \in \mathbb{R} \}$$
When we say that a tupel $(c^0,c^1,c^2,c^3)$ is a vector, we mean that is corresponds to the object $c^i\partial_i \in T_p\mathcal{M}$ at some point $p \in \mathcal{M}$.
A metric on $\mathcal{M}$ can be given by specifying a non-degenerate, bilinear form at each point
$$g_p : T_p\mathcal{M} \times T_p\mathcal{M} \rightarrow \mathbb{R}$$
What you learned "in general" is that the components of the metric are, for chosen basis vectors $\partial_i$ of $T_p\mathcal{M}$, defined by $g_{ij} = g(\partial_i,\partial_j)$. You can now indeed see the metric as a kind of scalar product, setting $X \cdot Y := g(X,Y)$ for two vectors $X,Y$. (This contains the answer to your second problem) But for non-Riemannian manifolds, i.e. manifolds where not all entries in the metric are positive, this is not a scalar product in the sense you may be used to. In particular, it can be zero. Vectors for which it is zero are usually called lightlike or null.
The important thing to take away is that manifolds do not always behave like cartesian space.
Now, for your third problem, we need the concept of the cotangent space $T_p^*\mathcal{M}$. It is the dual vector space to the tangent space, spanned by the differentials $\mathrm{d}x^i : T_p\mathcal{M} \rightarrow \mathbb{R}$ for a chosen coordinate system, and defined by
$$\mathrm{d}x^i(\partial_j) = \delta^i_j$$
Now, recall that the metric was a map from twice the tangent space to $\mathbb{R}$. As such, we can see it as an element of the tensor product $T_p^*\mathcal{M} \otimes T_p^*\mathcal{M}$, which is the space spanned by element of the form $\mathcal{d}x^i \otimes \mathcal{d}x^j$. As the metric is an element of this space, it is expandable in its basis:
$$ g = g_{ij}\mathrm{d}x^i\mathrm{d}x^j$$
where the physicist just drops the bothersome $\otimes$ sign. Now, what has this to do with infinitesimal distance? We simply define the length of a path $\gamma : [a,b] \rightarrow \mathcal{M}$ to be (with $\gamma'(t)$ denoting the tangent vector to the path)$[1]$
$$ L[\gamma] := \int_a^b \sqrt{\lvert g(\gamma'(t),\gamma'(t))\rvert}\mathrm{d}t$$
And, by using physicists' sloppy notation, $g(\gamma'(t),\gamma'(t)) = g_{ij} \frac{\mathrm{d}x^i}{\mathrm{d}t}\frac{\mathrm{d}x^j}{\mathrm{d}t}$, if we understand $x^i(t)$ as the $i$-th coordinate of the point $\gamma(t)$, and so:
$$ L[\gamma] = \int_a^b \sqrt{g_{ij} \frac{\mathrm{d}x^i}{\mathrm{d}t}\frac{\mathrm{d}x^j}{\mathrm{d}t}}\mathrm{d}t = \int_a^b \sqrt{g_{ij}\mathrm{d}x^i\mathrm{d}x^j}\frac{\mathrm{d}t}{\mathrm{d}t} = \int_a^b \sqrt{g_{ij}\mathrm{d}x^i\mathrm{d}x^j}$$
Since we call $\mathrm{d}s$ the infinitesimal line element that fulfills $L = \int \mathrm{d}s$, this is suggestive of the notation
$$ \mathrm{d}s^2 = g_{ij}\mathrm{d}x^i\mathrm{d}x^j$$
If we notice that, by the definition of tangent and cotangent vectors by differentials and deriviatives as above, things with upper indices transform exactly in the opposite way from the things with lower indices (see also my answer here), it is seen that this is indeed invariant under arbitrary coordinate transformations.
$[1]$ $\gamma'(t)$ is really a tangent vector in the following sense:
Let $x : \mathcal{M} \rightarrow \mathbb{R}^n$ be a coordinate chart. Consider then: $ x \circ \gamma : [a,b] \rightarrow \mathbb{R}^n$. Since it is an ordinary function between (subsets of) cartesian spaces, it has a derivative
$$(x \circ \gamma)' : [a,b] \rightarrow \mathbb{R}^n$$
Now, $(x \circ \gamma)'^i(t)$ be be thought of as the components of the tangent vector $\gamma'(t) := (x \circ \gamma)'^i(t)\partial_i \in T_{\gamma(t)}\mathcal{M}$. It is a somewhat tedious, but worthwhile excercise to show that this definition of $\gamma'(t)$ is independent of the choice of coordinates $x$.
You exam question with the surfaces is asking about something different. You are given an embedding of a lower-dimensional submanifold $\mathcal{N}$ into Cartesian space
$$ \sigma: \mathcal{N} \hookrightarrow \mathbb{R}^n $$
and asked to calculate the induced metric on the submanifold from the Cartesian metric
$$\mathrm{d}s^2 = \sum_{i = 1}^n \mathrm{d}(x^i)^2$$
(which is just the identity matrix in component form w.r.t. any orthonormal basis of coordinates in $\mathbb{R}^n$, i.e. the dot product)
Now, how is a metric induced? Let $y : \mathbb{R}^m \rightarrow \mathcal{N}$ be coordinates for the submanifold (you are actually given $\sigma \circ y$ in the question), and $x$ be the coordinates of the Cartesian space. Observe that any morphism of manifolds $\sigma$ induces a morphism of tangent spaces
$$ \mathrm{d}\sigma_p : T_p\mathcal{N} \rightarrow T_{\sigma(p)}\mathbb{R}^n, \frac{\partial}{\partial y^i} \mapsto \sum_j \frac{\partial(\sigma \circ y)^j}{\partial y^i}\frac{\partial}{\partial x^j} $$
called the differential of $\sigma$. As a morphism of vector spaces, it is a linear map given, as a matrix, by the Jacobian $\mathrm{d}\sigma^{ij} := \frac{\partial(\sigma \circ y)^j}{\partial y^i}$ of the morphism of manifolds. Now, inducing a metric means setting
$$ g_\mathcal{N}(\frac{\partial}{\partial y^i},\frac{\partial}{\partial y^j}) := g_\mathrm{Euclidean}(\mathrm{d}\sigma(\frac{\partial}{\partial y^i}),\mathrm{d}\sigma(\frac{\partial}{\partial y^j}))$$
On the right hand side is now the dot product of two ordinary vectors in $\mathbb{R}^n$, and what your exams call $\vec e_{y^i}$ is my $\mathrm{d}\sigma(\frac{\partial}{\partial y^i})$. If you note that you are given $\sigma \circ y$, then all you need to do is to calculate the metric components by calculating $g_\mathcal{N}$ as above for every possible combination of $y^i,y^j$ (in 2D, fortunately, there's only four).