For context, any general quantum operation $\Phi$ on a bipartite system $AB$ with finite dimensional Hilbert space $H_{A} \otimes H_{B}$ has a Kraus representation
$\Phi(\rho) = \sum_j K_j \rho K_j^{\dagger}$
where the "Kraus operators" $K_j$ are linear operators on $H_{A} \otimes H_{B}$ such that
$\sum_j K_j^{\dagger} K_j = I_{AB}$.
The Kraus representation is not unique, but if $\Phi$ has at least one Kraus representation in which ALL of the Kraus operators can be written as tensor products $K_j = L_j \otimes M_j$ then we say that the operation $\Phi$ is separable (here the $L_j$ and $K_j$ are linear operators on $H_{A}$ and $H_{B}$ respectively).
Any LOCC operation is a separable operation (so the answer to your second question is "yes") but there are separable operations which are not LOCC operations.
To see why LOCC operations are separable, we need to define exactly what we mean by LOCC. One way is using the idea of generalised measurements (see Nielsen and Chuang and What is the difference between general measurement and projective measurement?). A generalised measurement is represented by a particular Kraus decomposition of some operation. A generalised measurement on system $A$ with $k$ possible outcomes, which w.l.o.g. we label by $\{1, \ldots, k\}$, would be represented by linear operators $\{ L_1, \ldots, L_k \}$ on $H_{A}$ such that $\sum_{i=1}^k L_i^{\dagger} L_i = I_{A}$. If the density operator for $A$ is $\rho_{A}$ prior to the measurement, then the probability of outcome $i$ is $p(i) = \mathrm{Tr} L_i \rho_{A} L_i^{\dagger}$ and after the measurement, given outcome $i$, the density operator is
$L_i \rho_{A} L_i^{\dagger} / p(i).$ If the measurement is performed and we do not condition on the outcome, then the time evolution of the state is simply given by the operation with Kraus operators $\{ L_1, \ldots, L_k \}$.
Consider an LOCC protocol with a single round of classical communication, in which Alice performs the generalised measurement just described on system $A$ and sends the outcome $i$ to Bob who, based on this, selects a generalised measurement with elements $\{ M^{(i)}_j \}$ to perform on system $B$, which produces outcome $j$ (we can assume w.l.o.g. that all his measurement have the same number $m$ of possible outcomes).
Considered as a generalised measurement on the joint system $AB$, Alice's local measurement has operators $L_i \otimes I_B$, and a similar statement is true for Bob's local measurements. So, the probability of Bob's outcome being $j$ given that Alice's outcome was $i$ is
$$\mathrm{Tr} (I_{A} \otimes M^{(i)}_j) (L_i \otimes I_B) \rho_{AB} (L_i \otimes I_B)^{\dagger} (I_{A} \otimes M^{(i)}_j)^{\dagger} / p(i),$$ so the joint probability of outcome $(i,j)$ is
$$p(i,j) = \mathrm{Tr} (I_{A} \otimes M^{(i)}_j) (L_j \otimes I_B) \rho_{AB} (L_j \otimes I_B)^{\dagger} (I_{A} \otimes M^{(i)}_j)^{\dagger},$$
and the post-measurement state conditioned on this outcome is
$$(I_{A} \otimes M^{(i)}_j) (L_j \otimes I_B) \rho_{AB} (L_j \otimes I_B)^{\dagger} (I_{A} \otimes M^{(i)}_j)^{\dagger}.$$
Therefore, we can regard this whole LOCC protocol as a generalised measurement on $AB$ with Kraus operators $(I_{A} \otimes M^{(i)}_j) (L_i \otimes I_B) = L_j \otimes M^{(i)}_j$. Evidently, the time evolution of $AB$ when this protocol is performed and we don't condition on the measurement outcomes is
given by the operation $\Phi$ such that
$$\Phi(\rho_{AB}) = \sum_{j=1}^m \sum_{i=1}^k L_i \otimes M^{(i)}_j \rho_{AB} (L_i \otimes M^{(i)}_j)^{\dagger},$$
which is separable.
If we add a further round of classical communication to the protocol in which Bob send his measurement outcome to Alice, who uses it together with her earlier measurement outcome to pick another generalised measurement to perform on her system, yielding outcome $k$, the Kraus operators for the overall generalised measurement will be of the form
$(N^{(i,j)}_k \otimes I_B) (I_A \otimes M^{(i)}_j) (L_i \otimes I_{B}) = N^{(i,j)}_k L_i \otimes M^{(i)}_j$. Evidently, no matter how many rounds of classical communication and local generalised measurements we add, the resulting Kraus operators are tensor products of local linear operators, and the resulting time evolution of $AB$ is described by a separable operation.
The fact that there are separable operations which cannot be realised by an LOCC protocol is harder to show, but is a corollary of a number of results in the literature which show that separable operations can outperform LOCC operations in various tasks, for example state discrimination: See http://arxiv.org/abs/quant-ph/9804053 and http://arxiv.org/abs/0705.0795