@General timing notes
:#1
^Note 1

  Assuming that the operand adress and stack adress fall in different cache
  sets.
:#2
^Note 2

  Always locked, no cache hit case.
:#3
^Note 3

  Clocks = 10 + max(log (m),n)
                       
         m = multiplier value (min clocks for m=0)
         n = 3/5 for m
:#4
^Note 4

  Clocks = {quotient(count/operand length)}*7+9
         = 8 if count  operand length (8/16/32)
:#5
^Note 5

  Clocks = {quotient(count/operand length)}*7+9
         = 9 if count  operand length (8/16/32)
:#6
^Note 6

  Equal/not equal cases (penalty is the same regardless of lock).
:#7
^Note 7

  Assuming that adresses for memory read (for indirection), stack push/pop,
  and branch fall in different cache sets.
:#8
^Note 8

  Penalty for cache miss: add 6 clocks for every 16 bytes copied to new stack
  frame.
:#9
^Note 9

  Add 11 clocks for each unaccessed descriptor load.
:#10
^Note 10

  Refer to task switch clock to the cache miss penalty for each 16 bytes.
:#11
^Note 11

  Add 4 extra clocks to the cache miss penalty for each 16 bytes.
:#12
^Note 12

  Clocks = 8+4(b+1)+3(i+1)+3(n+1)
         = 6 if second operand = 0

     (b = 0-3, non-zero byte number);
     (i = 0-1, non-zero nibble number);
     (n = 0-3, non bit number in nibble);
:#13
^Note 13

  Clocks = 9+4(b+1)+3(i+1)+3(n+1)
         = 7 if second operand = 0

     (b = 0-3, non-zero byte number);
     (i = 0-1, non-zero nibble number);
     (n = 0-3, non bit number in nibble);
:#14
^Note 14

  Clocks = 7+3(32-n)
         = 6 if second operand = 0
:#15
^Note 15

  Clocks = 8+3(32-n)
         = 7 if second operand = 0
:#16
^Note 16

  Assuming that the two strings addresses fall in different cache sets.
:#17
^Note 17

  Cache miss penalty: add 6 clocks for every 16 bytes compared. Entire
  penalty on first compare.
:#18
^Note 18

  Cache miss penalty: add 2 clocks for every 16 bytes of data. Entire penalty
  on first load.
:#19
^Note 19

  Cache miss penalty: add 4 clocks for every 16 bytes moved.
  (1 clock for the first operation and 3 for the second)
:#20
^Note 20

  Cache miss penalty: add 4 clocks for every 16 bytes scanned.
  (2 clocks each for first an second operations)
:#21
^Note 21

  Refer to interrupt clock counts table for value of INT.
:#22
^Note 22

  Clock count includes one clock for using both displacement and immediate.
:#23
^Note 23

  Refer to assumption 6 in the case of cache miss.
:#16/32
^Note 16/32

  16/32 bit modes
:#U/L
^Note U/L

  unlocked/locked
:#MN/MX
^Note MN/MX

  minimum/maximum
:#L/NL
^Note L/NL

  loop/no loop
:#RV/P
^Note RV/P

  real and virtual mode/protected mode
:#R
^Note R

  real mode
:#P
^Note P

  protected mode
:#T/NT
^Note T/NT

  taken/not taken
:#H/NH
^Note H/NH

  hit/no hit
:#T/F
^Note T/F

  true/false
:#24
^Note 24

  Two clock cache miss penalty in all cases.

:#25
^Note 25

  c = count in CX or ECX

:#26
^Note 26

  Cache mis penalty in all modes: Add 2 clocks for every 16 bytes. Entire
  penalty on second operation.
:Clk cnt assumptions
^Instruction clock count assumptions

  The Intel486 microprocessor instruction clock count tables give clock
  counts assuming data and instruction accesses hit in the cache. A
  separate penalty collum defines clocks to add it a data access misses
  in the cache. The combined instruction and data cache hit rate is over
  90%.

  A cache miss will force the Intel486 microprocessor to run an external
  bus cycle. The Intel486 microprocessor 32-bit burst bus is defined as
  r-b-w.

  Where:
  r = The number of clocks in the first cycle of a butst read or the
      number of clocks per data cycle is a non-burst read.
  b = The number of clocks for the second and subsequent cycles in
      a burst read.
  w = The number of clocks for a write.

  The fastest bus the Intel486 microprocessor can support 2-1-2 assuming
  0 wait states. The clock counts in the cache miss penalty collum
  assume a 2-1-2 bus. For slower busses add r-2 clocks to the cache miss
  penalty for the first dword accesed. Other factors also affect
  instruction clock counts.

% Instruction Clock Count Assumptions
  1. The external bus is available for reads or writes at all times.
     Else add clock to reads until the bus is available.
  2. Accesses are alligned. Add three clock to each misaligned access.
  3. Cache fills complete before subsequent accesses to the same line.
     If a read misses the cache during a cache fill due to a previous
     read or prefetch, the read must wait for the cache fill to
     complete. If a read or write accesses a cache line still being
     filled, it must wait for the fill to complette.
  4. If an effective address is calculated, the base register is not the
     destination register for the preceding instruction. If the base
     register is the destination register of the preceding instruction
     add 1 to clock counts shown. Back-to-back PUSH and POP instructions
     are not affected by this rule.
  5. An effective address calculation uses one base register and does
     not use an index register. However, if the effective address
     calculation uses an index register, 1 clock *may* be added to clock
     count shown.
  6. The target of a jump is in the cache. If not, add r clocks for
     accessing the destination instruction of a jump. If the destination
     instruction is not completly contained in the first dowrd read, add
     a maximum of 3b clocks. If the destination instruction is not
     completly contained in the first 16 byte burst, add a maximum of
     another r+3b clocks.
  7. If no write buffer delay, w clocks are added only in the case in
     which all write buffers are full. Typically, this case rerely
     occours.
  8. Displacement and immediate not used together. If displacement and
     immediate used together, 1 clock *may* be added to the clock count
     shown.
  9. No invalidate cycles. Add a delay of 1 clock for each invalidate
     cycle if the invalidate cycle contends for internal cache/external
     bus when the Intel486CPU needs to use it.
 10. Page translation hits in TBL. A TBL miss will add 13, 21 or 28
     clocks to the instruction dependingon whether the Accessed and/or
     Dirty bit in neither, one or both of the page entries needs to be
     set in memory. This assumes that neither page entry is in the data
     cache and a page fault does not occour on the address translation.
 11. No expetions are detected during instruction execution. Refer to
     Interrupt Clock Counts Table for extra clocks if an interrupt is
     detected.
 12. Instruction that read multiple consecutive data items (i.o. task
     switch, POPA, etc.) and miss the cache are assumed to start the
     first access on a 16-byte boundary. If not, an extra cache line
     fill may be necessary which may add up to (r+3b) clocks to the
     cache miss penalty.
