Operators are associative as seen here.
But when we try to calculate $[\hat{x}, \hat{p}]$ for example, we use a test function and apply $\hat{p}$ to both $\hat{x}$ and the function, instead of associating $\hat{p}$ with $\hat{x}$ and be done with it (obviously the result would be incorrect).
Edit to elaborate my question:
$[\hat{x}, \hat{p}] f = \hat{x}\hat{p}f - (\hat{p} \hat{x}) f = \hbar/i \bigg(\hat{x} (\partial f/\partial x) - (\partial x/\partial x)f\bigg) = \hbar/i \bigg(\hat{x} \ \partial /\partial x - 1) f\bigg)$
Why can't we do what we did on the second term? Associate the operators before applying them on the test functions?