Bottom line (TL;DR):
LFENCE alone indeed seems useless for memory ordering, however it does not make
SFENCE a substitute for
MFENCE. The “arithmetic” logic in the question is not applicable.
Here is an excerpt from Intel’s Software Developers Manual, volume 3, section 8.2.2 (the edition 325384-052US of September 2014), the same that I used in another answer
- Reads are not reordered with other reads.
- Writes are not reordered with older reads.
- Writes to memory are not reordered with other writes, with the following exceptions:
- writes executed with the CLFLUSH instruction;
- streaming stores (writes) executed with the non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and
- string operations (see Section 188.8.131.52).
- Reads may be reordered with older writes to different locations but not with older writes to the same location.
- Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions.
- Reads cannot pass earlier LFENCE and MFENCE instructions.
- Writes cannot pass earlier LFENCE, SFENCE, and MFENCE instructions.
- LFENCE instructions cannot pass earlier reads.
- SFENCE instructions cannot pass earlier writes.
- MFENCE instructions cannot pass earlier reads or writes.
From here, it follows that:
MFENCEis a full memory fence for all operations on all memory types, whether non-temporal or not.
SFENCEonly prevents reordering of writes (in other terminology, it’s a StoreStore barrier), and is only useful together with non-temporal stores and other instructions listed as exceptions.
LFENCEprevents reordering of reads with subsequent reads and writes (i.e. it combines LoadLoad and LoadStore barriers). However, the first two bullets say that LoadLoad and LoadStore barriers are always in place, no exceptions. Therefore
LFENCEalone is useless for memory ordering.
To support the last claim, I looked at all places where
LFENCE is mentioned in all 3 volumes of Intel’s manual, and found none which would say that
LFENCE is required for memory consistency. Even
MOVNTDQA – the only non-temporal load instruction so far – mentions
MFENCE but not
Update: see answers on Why is (or isn’t?) SFENCE + LFENCE equivalent to MFENCE? for correct answers to the guesswork below
MFENCE is equivalent to a “sum” of other two fences or not is a tricky question. At glance, among the three fence instructions only
MFENCE provides StoreLoad barrier, i.e. prevents reordering of reads with earlier writes. However the correct answer requires to know more than the above rules; namely, it’s important that all fence instructions are ordered with respect to each other. This makes the
SFENCE LFENCE sequence more powerful than a mere union of individual effects: this sequence also prevents StoreLoad reordering (because loads cannot pass
LFENCE, which cannot pass
SFENCE, which cannot pass stores), and thus constitutes a full memory fence (but also see the note (*) below). Note however that order matters here, and the
LFENCE SFENCE sequence does not have the same synergy effect.
However, while one can say that
MFENCE ~ SFENCE LFENCE and
LFENCE ~ NOP, that does not mean
MFENCE ~ SFENCE. I deliberately use equivalence (~) and not equality (=) to stress that arithmetic rules do not apply here. The mutual effect of
SFENCE followed by
LFENCE makes the difference; even though loads are not reordered with each other,
LFENCE is required to prevent reordering of loads with
(*) It still might be correct to say that
MFENCE is stronger than the combination of the other two fences. In particular, a note to
CLFLUSH instruction in the volume 2 of Intel’s manual says that “
CLFLUSH is only ordered by the
MFENCE instruction. It is not guaranteed to be ordered by any other fencing or serializing instructions or by another
clflush is now defined as strongly ordered (like a normal store, so you only need
mfence if you want to block later loads), but
clflushopt is weakly ordered, but can be fenced by