MOVNTDQA

Load Double Quadword Non-Temporal Aligned Hint

Opcodes

Hex Mnemonic Encoding Long Mode Legacy Mode Description
66 0F 38 2A /r MOVNTDQA xmm1, m128 A Valid Valid Move double quadword from m128 to xmm using non-temporal hint if WC memory type.

Instruction Operand Encoding

Op/En Operand 0 Operand 1 Operand 2 Operand 3
A NA NA ModRM:r/m (r) ModRM:reg (w)

Description

MOVNTDQA loads a double quadword from the source operand (second operand) to the destination operand (first operand) using a non-temporal hint. A processor implementation may make use of the non-temporal hint associated with this instruction if the memory source is WC (write combining) memory type. An implementation may also make use of the non-temporal hint associated with this instruction if the memory source is WB (write back) memory type.

A processor's implementation of the non-temporal hint does not override the effective memory type semantics, but the implementation of the hint is processor dependent. For example, a processor implementation may choose to ignore the hint and process the instruction as a normal MOVDQA for any memory type. Another implementation of the hint for WC memory type may optimize data transfer throughput of WC reads. A third implementation may optimize cache reads generated by MOVNTDQA on WB memory type to reduce cache evictions.

WC Streaming Load Hint

For WC memory type in particular, the processor never appears to read the data into the cache hierarchy. Instead, the non-temporal hint may be implemented by loading a temporary internal buffer with the equivalent of an aligned cache line without filling this data to the cache. Any memory-type aliased lines in the cache will be snooped and flushed. Subsequent MOVNTDQA reads to unread portions of the WC cache line will receive data from the temporary internal buffer if data is available. The temporary internal buffer may be flushed by the processor at any time for any reason, for example:

  • A load operation other than a MOVNTDQA which references memory already resident in a temporary internal buffer.
  • A non-WC reference to memory already resident in a temporary internal buffer.
  • Interleaving of reads and writes to memory currently residing in a single temporary internal buffer.
  • Repeated MOVNTDQA loads of a particular 16-byte item in a streaming line.
  • Certain micro-architectural conditions including resource shortages, detection of a mis-speculation condition, and various fault conditions

The memory type of the region being read can override the non-temporal hint, if the memory address specified for the non-temporal read is not a WC memory region. Information on non-temporal reads and writes can be found in Chapter 11, "MemoryCache Control" ofIntelĀ® 64 and IA-32 Architectures Software Developer's Manual,Volume 3A.

Because the WC protocol uses a weakly-ordered memory consistency model, an MFENCE or locked instruction should be used in conjunction with MOVNTDQA instructions if multiple processors might reference the same WC memory locations or in order to synchronize reads of a processor with writes by other agents in the system. Because of the speculative nature of fetching due to MOVNTDQA, Streaming loads must not be used to reference memory addresses that are mapped to I/O devices having side effects or when reads to these devices are destructive. For additional information on MOVNTDQA usages, see Section 12.10.3 in Chapter 12, "Programming with SSE3, SSSE3 and SSE4" ofIntelĀ®64 and IA-32 Architectures SoftwareDeveloper's Manual, Volume 1.

Pseudo Code

DST = SRC;

Flags Affected

None

Exceptions

64-Bit Mode Exceptions

Exception Description
#UD If EM in CR0 is set. If OSFXSR in CR4 is 0. If CPUID feature flag ECX.SSE4_1 is 0. If LOCK prefix is used. Either the prefix REP (F3h) or REPN (F2H) is used.
#NM If TS in CR0 is set.
#PF(fault-code) For a page fault.
#SS(0) If a memory address referencing the SS segment is in a non- canonical form.
#GP(0) If the memory address is in a non-canonical form. If a memory operand is not aligned on a 16-byte boundary, regardless of segment.

Compatibility Mode Exceptions

Same exceptions as in Protected Mode.

Virtual-8086 Mode Exceptions

Exception Description
#PF(fault-code) For a page fault.
Same exceptions as in Real Address Mode.

Real-Address Mode Exceptions

Exception Description
#UD If CR0.EM[bit 2] = 1. If CR4.OSFXSR[bit 9] = 0. If CPUID.01H:ECX.SSE4_1[bit 19] = 0. If LOCK prefix is used. Either the prefix REP (F3h) or REPN (F2H) is used.
#NM If CR0.TS[bit 3] = 1.
#GP(0) if any part of the operand lies outside of the effective address space from 0 to 0FFFFH. If a memory operand is not aligned on a 16-byte boundary, regardless of segment.

Protected Mode Exceptions

Exception Description
#UD If CR0.EM[bit 2] = 1. If CR4.OSFXSR[bit 9] = 0. If CPUID.01H:ECX.SSE4_1[bit 19] = 0. If LOCK prefix is used. Either the prefix REP (F3h) or REPN (F2H) is used.
#NM If CR0.TS[bit 3] = 1.
#PF(fault-code) For a page fault.
#SS(0) For an illegal address in the SS segment.
#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. If a memory operand is not aligned on a 16-byte boundary, regardless of segment.