MPSADBW

Compute Multiple Packed Sums of Absolute Difference

Opcodes

Hex Mnemonic Encoding Long Mode Legacy Mode Description
66 0F 3A 42 /r ib MPSADBW xmm1, xmm2/m128, imm8 A Valid Valid Sums absolute 8-bit integer difference of adjacent groups of 4 byte integers in xmm1 and xmm2/m128 and writes the results in xmm1. Starting offsets within xmm1 and xmm2/m128 are determined by imm8.

Instruction Operand Encoding

Op/En Operand 0 Operand 1 Operand 2 Operand 3
A NA imm8 ModRM:r/m (r) ModRM:reg (r, w)

Description

MPSADBW sums the absolute difference (SAD) of a pair of unsigned bytes for a group of 4 byte pairs, and produces 8 SAD results (one for each 4 byte-pairs) stored as 8 word integers in the destination operand (first operand). Each 4 byte pairs are selected from the source operand (first opeand) and the destination according to the bit fields specified in the immediate byte (third operand).

The immediate byte provides two bit fields:

SRC_OFFSET: the value of Imm8[1:0]*32 specifies the offset of the 4 sequential source bytes in the source operand.

DEST_OFFSET: the value of Imm8[2]*32 specifies the offset of the first of 8 groups of 4 sequential destination bytes in the destination operand. The next four destination bytes starts at DEST_OFFSET + 8, etc.

The SAD operation is repeated 8 times, each time using the same 4 source bytes but selecting the next group of 4 destination bytes starting at the next higher byte in the destination. Each 16-bit sum is written to destination.

Pseudo Code

SRC_OFFSET = imm8[1:0]*32
DEST_OFFSET = imm8[2]*32
DEST_BYTE0 = DEST[DEST_OFFSET+7:DEST_OFFSET]
DEST_BYTE1 = DEST[DEST_OFFSET+15:DEST_OFFSET+8]
DEST_BYTE2 = DEST[DEST_OFFSET+23:DEST_OFFSET+16]
DEST_BYTE3 = DEST[DEST_OFFSET+31:DEST_OFFSET+24]
DEST_BYTE4 = DEST[DEST_OFFSET+39:DEST_OFFSET+32] DEST_BYTE5 = DEST[DEST_OFFSET+47:DEST_OFFSET+40] DEST_BYTE6 = DEST[DEST_OFFSET+55:DEST_OFFSET+48] DEST_BYTE7 = DEST[DEST_OFFSET+63:DEST_OFFSET+56] DEST_BYTE8 = DEST[DEST_OFFSET+71:DEST_OFFSET+64] DEST_BYTE9 = DEST[DEST_OFFSET+79:DEST_OFFSET+72] DEST_BYTE10 = DEST[DEST_OFFSET+87:DEST_OFFSET+80]
SRC_BYTE0 = SRC[SRC_OFFSET+7:SRC_OFFSET] SRC_BYTE1 = SRC[SRC_OFFSET+15:SRC_OFFSET+8] SRC_BYTE2 = SRC[SRC_OFFSET+23:SRC_OFFSET+16] SRC_BYTE3 = SRC[SRC_OFFSET+31:SRC_OFFSET+24]
TEMP0 = ABS(DEST_BYTE0 - SRC_BYTE0) TEMP1 = ABS(DEST_BYTE1 - SRC_BYTE1) TEMP2 = ABS(DEST_BYTE2 - SRC_BYTE2) TEMP3 = ABS(DEST_BYTE3 - SRC_BYTE3) DEST[15:0] = TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 = ABS(DEST_BYTE1 - SRC_BYTE0) TEMP1 = ABS(DEST_BYTE2 - SRC_BYTE1) TEMP2 = ABS(DEST_BYTE3 - SRC_BYTE2) TEMP3 = ABS(DEST_BYTE4 - SRC_BYTE3) DEST[31:16] = TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 = ABS(DEST_BYTE2 - SRC_BYTE0) TEMP1 = ABS(DEST_BYTE3 - SRC_BYTE1) TEMP2 = ABS(DEST_BYTE4 - SRC_BYTE2) TEMP3 = ABS(DEST_BYTE5 - SRC_BYTE3) DEST[47:32] = TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 = ABS(DEST_BYTE3 - SRC_BYTE0) TEMP1 = ABS(DEST_BYTE4 - SRC_BYTE1) TEMP2 = ABS(DEST_BYTE5 - SRC_BYTE2) TEMP3 = ABS(DEST_BYTE6 - SRC_BYTE3) DEST[63:48] = TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 = ABS(DEST_BYTE4 - SRC_BYTE0) TEMP1 = ABS(DEST_BYTE5 - SRC_BYTE1) TEMP2 = ABS(DEST_BYTE6 - SRC_BYTE2) TEMP3 = ABS(DEST_BYTE7 - SRC_BYTE3) DEST[79:64] = TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 = ABS(DEST_BYTE5 - SRC_BYTE0)
TEMP1 = ABS(DEST_BYTE6 - SRC_BYTE1)
TEMP2 = ABS(DEST_BYTE7 - SRC_BYTE2)
TEMP3 = ABS(DEST_BYTE8 - SRC_BYTE3)
DEST[95:80] = TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 = ABS(DEST_BYTE6 - SRC_BYTE0)
TEMP1 = ABS(DEST_BYTE7 - SRC_BYTE1)
TEMP2 = ABS(DEST_BYTE8 - SRC_BYTE2)
TEMP3 = ABS(DEST_BYTE9 - SRC_BYTE3)
DEST[111:96] = TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 = ABS(DEST_BYTE7 - SRC_BYTE0)
TEMP1 = ABS(DEST_BYTE8 - SRC_BYTE1)
TEMP2 = ABS(DEST_BYTE9 - SRC_BYTE2)
TEMP3 = ABS(DEST_BYTE10 - SRC_BYTE3)
DEST[127:112] = TEMP0 + TEMP1 + TEMP2 + TEMP3

Flags Affected

None

Exceptions

64-Bit Mode Exceptions

Exception Description
#UD If EM in CR0 is set. If OSFXSR in CR4 is 0. If CPUID feature flag ECX.SSE4_1 is 0. If LOCK prefix is used. Either the prefix REP (F3h) or REPN (F2H) is used.
#NM If TS in CR0 is set.
#PF(fault-code) For a page fault.
#SS(0) If a memory address referencing the SS segment is in a non- canonical form.
#GP(0) If the memory address is in a non-canonical form. If a memory operand is not aligned on a 16-byte boundary, regardless of segment.

Compatibility Mode Exceptions

Same exceptions as in Protected Mode.

Virtual-8086 Mode Exceptions

Exception Description
#PF(fault-code) For a page fault.
Same exceptions as in Real Address Mode.

Real-Address Mode Exceptions

Exception Description
#UD If CR0.EM[bit 2] = 1. If CR4.OSFXSR[bit 9] = 0. If CPUID.01H:ECX.SSE4_1[bit 19] = 0. If LOCK prefix is used. Either the prefix REP (F3h) or REPN (F2H) is used.
#NM If CR0.TS[bit 3] = 1.
#GP(0) if any part of the operand lies outside of the effective address space from 0 to 0FFFFH. If a memory operand is not aligned on a 16-byte boundary, regardless of segment.

Protected Mode Exceptions

Exception Description
#UD If CR0.EM[bit 2] = 1. If CR4.OSFXSR[bit 9] = 0. If CPUID.01H:ECX.SSE4_1[bit 19] = 0. If LOCK prefix is used. Either the prefix REP (F3h) or REPN (F2H) is used.
#NM If CR0.TS[bit 3] = 1.
#PF(fault-code) For a page fault.
#SS(0) For an illegal address in the SS segment.
#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. If a memory operand is not aligned on a 16-byte boundary, regardless of segment.