A SERVICE OF

logo

Appendix C Instruction Latencies 309
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
FDIVRP ST(i), ST DEh 11-110-xxx DirectPath FMUL 16/20
/24
1, 6
FFREE ST(i) DDh 11-000-xxx DirectPath FADD/FMUL/
FSTORE
21, 2
FIADD [mem32int] DAh mm-000-xxx Double - 11
FIADD [mem16int] DEh mm-000-xxx Double - 11
FICOM [mem32int] DAh mm-010-xxx Double - 9
FICOM [mem16int] DEh mm-010-xxx Double - 9
FICOMP [mem32int] DAh mm-011-xxx Double - 9
FICOMP [mem16int] DEh mm-011-xxx Double - 9
FIDIV [mem32int] DAh mm-110-xxx Double - 18
FIDIV [mem16int] DEh mm-110-xxx Double - 18
FIDIVR [mem32int] DAh mm-111-xxx Double - 18
FIDIVR [mem16int] DEh mm-111-xxx Double - 18
FILD [mem16int] DFh mm-000-xxx DirectPath FSTORE 6
FILD [mem32int] DBh mm-000-xxx DirectPath FSTORE 6
FILD [mem64int] DFh mm-101-xxx DirectPath FSTORE 6
FIMUL [mem32int] DAh mm-001-xxx Double - 11
FIMUL [mem16int] DEh mm-001-xxx Double - 11
FINCSTP D9h 11-110-111 DirectPath FADD/FMUL/
FSTORE
22
FINIT DBh 11-100-011 VectorPath - ~
FIST [mem16int] DFh mm-010-xxx DirectPath FSTORE 4
FIST [mem32int] DBh mm-010-xxx DirectPath FSTORE 4
FISTP [mem16int] DFh mm-011-xxx DirectPath FSTORE 4
FISTP [mem32int] DBh mm-011-xxx DirectPath FSTORE 4
Table 15. x87 Floating-Point Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency Note
First
byte
Second
byte
ModRM byte
Notes:
1. The last three bits of the ModRM byte select the stack entry ST(i).
2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP
with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of
three per cycle and can use any of the three execution resources.
3. This is a VectorPath decoded operation that uses one execution pipe (one ROP).
4. There is additional latency associated with this instruction. ā€œeā€ represents the difference between the exponents
of the divisor and the dividend. If ā€œsā€ is the number of normalization shifts performed on the result, then
n = (s+1)/2 where (0 <= n <= 32).
5. The latency provided for this operation is the best-case latency.
6. The three latency numbers represent the latency values for precision control settings of single precision, double
precision, and extended precision, respectively.