The use of the Lagrangian density is a convenience, and it is not directly related to causality or relativity, and neither strictly to quantum theories. What I mean is that it is possible to formulate non-relativistic (quantum or classical) field theories using exactly the same language.
The difference between "mechanics" and "field theory" is that, instead of using particles as the fundamental objects of the theory, we use fields i.e. functions $\phi:X\to \mathbb{C}$. So, while in mechanics the phase space is a finite dimensional structure (e.g. a manifold) in field theory it is an infinite dimensional one (e.g. the space of measurable functions).
The physical motivation is that there are systems that are impossible to describe with a finite number of degrees of freedom (e.g. the elctromagnetic field). Special relativity adds an important motivation, on the quantum side: particles can be created and destroyed, therefore your relativistic quantum space must contain all the possible configurations with an arbitrary number $n$ of particles; this is conveniently described mathematically considering the one-particle QM space as the classical phase space, and build the so called Fock-Cook (second) quantization upon it. Again, an infinite dimensional phase-space ($L^2(\Omega)$, for some suitable $\Omega$) has to be considered.
Once you are in an infinite-dimensional phase space setting, the introduction of the Lagrangian density is quite natural, and it is a matter of convenience. Let $\mathbb{C}^X$ be the set of functions from $X$ to $\mathbb{C}$, and your infinite dimensional phase space (or position-velocity space for lagrangian formulation). Mimicking the finite dimensional situation, you want to build up a function(al) of the variables of the system $\phi\in \mathbb{C}^X$ that full encodes the dynamical informations. We call it Lagrangian function $L:\mathbb{C}^X\to \mathbb{R}$ (usually taken to be real-valued). For the moment it is not important to worry about the distinction between variables (fields $\phi$) and their derivatives ($\dot{\phi}$); think of them as "independent variables" inside $\mathbb{C}^X$ (as position and velocity are in finite dimensions).
Now, we have an object $L(\cdot)$ that encodes the dynamical informations, and has to be evaluated on functions $\phi\in \mathbb{C}^X$. What happens in most situations in practice, is that the Lagrangian consists of two formal operations, one that provides a pointwise information for each $x\in X$ (depending on $\phi$), and another that puts all those informations together providing the complete evaluation $L(\phi)$. The first is the Lagrangian density $\mathscr{L}:\mathbb{C}^X\to \mathbb{R}^X$ (because usually it is real-valued), the second is the "integration" or we may call it in general the global evaluation $\mathbb{E}_{gl}: \mathbb{R}^X\to \mathbb{R}$. This splitting results in $L=\mathbb{E}_{gl}\circ \mathscr{L}$, and it is convenient if you want to separate the "functional pointwise manipulations", done in the lagrangian density, from the global evaluation of these manipulations w.r.t. the totality of points.
However, apart from convenience, this is just the standard setting of infinite dimensional phase spaces; no relativistic or quantum considerations involved a priori.