In caam_jr_enqueue, under heavy DDR load, smp_wmb() or dma_wmb() fail to make the input ring be updated before the CAAM starts reading it. So, CAAM will process, again, an old descriptor address and will put it in the output ring. This will make caam_jr_dequeue() to fail, since this old descriptor is not in the software ring. To fix this, use wmb() which works on the full system instead of inner/outer shareable domains.
this is an attempt to fix #467 please test.