>The appropriate uses of mfence are actually much more limited
This is true for regular race-free multithreaded programs. However, mfences are used fairly commonly in lock-free, concurrent data structures (http://concurrencykit.org/)
If you see code compile to a full mfence in CK, please tell us. In my experience, there is no concurrency-related related reason to use mfence on x86. In fact, you'll see that we have code that checks for x86oids and use an atomic RMW instead of non-atomic store/RMW + fence.
This is true for regular race-free multithreaded programs. However, mfences are used fairly commonly in lock-free, concurrent data structures (http://concurrencykit.org/)