    Even with a number of waitqueues, we can get into a situation where we
    are heavily contended on the waitqueue lock. I got a report on spc1
    where we're spending seconds doing this. Arguably the use case is nasty,
    I reproduce it with one device and 1000 threads banging on the device.
    But that doesn't mean we shouldn't be handling it better.
    What ends up happening is that a thread will fail to get a tag, add
    itself to the waitqueue, and subsequently get woken up when a tag is
    freed - only to find itself going back to sleep on the waitqueue.
    Instead of waking all threads, use an exclusive wait and wake up our
    sbitmap batch count instead. This seems to work well for me (massive
    improvement for this use case), and it survives basic testing. But I
    haven't fully verified it yet.
    An additional improvement is running the queue and checking for a new
    tag BEFORE needing to add ourselves to the waitqueue.
