1) A matrix is literally nothing else than a map: $\mathbb{N}_m\times\mathbb{N}_n\rightarrow R$, where $\mathbb{N}_m$ is a subset of $\mathbb{N}$ that contains $m$ distinct elements, and $R$ is a commutative ring (you can take $R$ to be noncommutative as well tbh, as the case is when using differential form valued matrices for example).
If you are being more general about it, it doesn't have to be a binary map, it then produces "arrays" that are indexed by more than one or two natural numbers. An example of such a matrix that isn't a tensor are the connection coefficients/Christoffel symbols $\Gamma^\sigma_{\mu\nu}$. You can view this as a more general matrix, or as a collection usual 'binary' matrices, one for every value of $\mu$, but these components do not transform tensorially.
A tensor, depending on where you are coming from has inherently more geometric or algebraic content than a matrix. From the geometric point of view, tensors may be identified with matrices, but only if you gauge-fix a frame and they satisfy certain transformation rules, one of which you have stated in your post. From the algebraic point of view, the space of tensors has to satisfy what is called the universal factorization property, stated as
Universal factorization property: Let $(V\otimes W,p)$ be the tensor product of the vector spaces $V$ and $W$. Then for any multilinear map $A:V\times W\rightarrow X$, there exists a unique linear map $A^\otimes:V\otimes W\rightarrow X$ such that $A=A^\otimes\circ p$ .
2) I find it impossibly hard to explain this without the use of principal fiber bundles, but the concept of "covariance" is not as simple. Here are several statements:
An equation between two componentical objects is always frame-independent, if the two objects transform equally during change of frame. If two spinors are equal in some frame, they are equal in other frames as well. If two densities are equal in some frame, they are equal in other frames as well. If two connections are equal in some frame, they are equal in other frames as well.
Because tensors transform homogenously, if a tensor is zero in some frame, then it is zero in other frames as well. But, this is not unique to tensors, for example, densities also transform homogenously, so this is true for them as well.
The previous statement implies that a tensor equation reduced to zero ($S=T\Leftrightarrow S-T=0$) stays that way during change of frame. Note that this is once again not unique to tensors. Connections do not transform homogenously, so $\Gamma^\sigma_{\ \mu\nu}=0$ is not a frame-independent equation, but if $\Gamma$ and $\omega$ are connections, then $\Gamma^\sigma_{\ \mu\nu}-\omega^\sigma_{\ \mu\nu}=0$ is frame independent, because the difference between two connexions transform homogenously (well, tensorially, in fact, but homogenous would be enough).
The above show that the usual shizzle about "general covariance" is not enough to fix tensors as objects of interest. The thing about tensors is that their components have (multi)linear dependence on directions. This is something neither densities nor connections possess (spinors kind of do, but that is a very different matter). $$\ $$ For this reason tensors are used to represent physical quantities that depend linearly on one or more directions. These are the quantities that are 1) frame-independent 2) possible to be measured pointwise. By contrast, densities are used to represent physical quantities for which only integrals are frame-independent and connections are used to represent physical quanties that are inherently frame-dependent (the gravitational field, for example).
Now, using more advanced terminology, I would say that a class of fields is covariant with respect to a Lie group $G$, if there exists a principal fiber bundle $(P,\pi,M,G)$, such that $G$ admits a representation $\rho:G\rightarrow GL(k)$ ($GL(k)$ might be real or complex general linear group) and such that the fields in question are sections of an associated vector bundle $(P\times_\rho \mathbb{F}^k,\pi,M,\mathbf{F}^k,\rho(G))$.
For $P=F(M)$ the frame bundle, $G=GL(n,\mathbb{R})$ and $\rho$ being contragredient and tensor product representations of the fundamental representation, this produces tensors and for $\rho:G\rightarrow GL(1,\mathbb{R}),\ \rho(A)=|\det(A)|\cdot$, this produces scalar densities of weight 1, so these objects are all "covariant" with respect to $GL(n,\mathbb{R})$, but for example connexions sit outside this framework.