MASKMOVDQU

Store Selected Bytes of Double Quadword

Opcodes

Hex Mnemonic Encoding Long Mode Legacy Mode Description
66 0F F7 /r MASKMOVDQU xmm1, xmm2 A Valid Valid Selectively write bytes from xmm1 to memory location using the byte mask in xmm2. The default memory location is specified by DS:EDI/RDI.

Instruction Operand Encoding

Op/En Operand 0 Operand 1 Operand 2 Operand 3
A NA NA ModRM:r/m (r) ModRM:reg (r)

Description

Stores selected bytes from the source operand (first operand) into an 128-bit memory location. The mask operand (second operand) selects which bytes from the source operand are written to memory. The source and mask operands are XMM registers. The location of the first byte of the memory location is specified by DI/EDI and DS registers. The memory location does not need to be aligned on a natural boundary. (The size of the store address depends on the address-size attribute.)

The most significant bit in each byte of the mask operand determines whether the corresponding byte in the source operand is written to the corresponding byte location in memory: 0 indicates no write and 1 indicates write.

The MASKMOVDQU instruction generates a non-temporal hint to the processor to minimize cache pollution. The non-temporal hint is implemented by using a write combining (WC) memory type protocol (see "Caching of Temporal vs. Non-Temporal Data" in Chapter 10, of theIntel®64 and IA-32 Architectures Software Developer'sManual, Volume 1). Because the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with the SFENCE or MFENCE instruction should be used in conjunction with MASKMOVDQU instructions if multiple processors might use different memory types to read/write the destination memory locations.

Behavior with a mask of all 0s is as follows:

The MASKMOVDQU instruction can be used to improve performance of algorithms that need to merge data on a byte-by-byte basis. MASKMOVDQU should not cause a read for ownership; doing so generates unnecessary bandwidth since data is to be written directly using the byte-mask without allocating old data prior to the store.

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).

Pseudo Code

IF (MASK[7] = 1)
	DEST[DI/EDI] = SRC[7:0]
ELSE
	(* Memory location unchanged *);
FI;
IF (MASK[15] = 1)
	DEST[DI/EDI +1] = SRC[15:8]
ELSE
	(* Memory location unchanged *);
FI; (* Repeat operation for 3rd through 14th bytes in source operand *)
IF (MASK[127] = 1)
	DEST[DI/EDI +15] = SRC[127:120]
ELSE
	(* Memory location unchanged *);
FI;

Exceptions

64-Bit Mode Exceptions

Exception Description
#UD If CR0.EM[bit 2] = 1. If CR4.OSFXSR[bit 9] = 0. If CPUID.01H:EDX.SSE2[bit 26] = 0. If the LOCK prefix is used.
#NM If CR0.TS[bit 3] = 1.
#PF(fault-code) For a page fault (implementation specific).
#SS(0) If a memory address referencing the SS segment is in a non- canonical form.
#GP(0) If the memory address is in a non-canonical form.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

Virtual-8086 Mode Exceptions

Exception Description
#UD If the LOCK prefix is used.
#PF(fault-code) For a page fault (implementation specific).
Same exceptions as in real address mode.

Real-Address Mode Exceptions

Exception Description
#UD If CR0.EM[bit 2] = 1. If CR4.OSFXSR[bit 9] = 0. If CPUID.01H:EDX.SSE2[bit 26] = 0. If the LOCK prefix is used.
#NM If CR0.TS[bit 3] = 1.
#GP If any part of the operand lies outside the effective address space from 0 to FFFFH. (even if mask is all 0s).

Protected Mode Exceptions

Exception Description
#UD If CR0.EM[bit 2] = 1. If CR4.OSFXSR[bit 9] = 0. If CPUID.01H:EDX.SSE2[bit 26] = 0. If the LOCK prefix is used.
#NM If CR0.TS[bit 3] = 1.
#PF(fault-code) For a page fault (implementation specific).
#SS(0) For an illegal address in the SS segment (even if mask is all 0s).
#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. (even if mask is all 0s). If the destination operand is in a nonwritable segment. If the DS, ES, FS, or GS register contains a NULL segment selector.