sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 96 3 -redir:sim tempOutput3 matrix 

sim: simulation started @ Thu Dec 15 12:07:15 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         96 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13149292 # total number of instructions executed
sim_total_refs              4034192 # total number of loads and stores executed
sim_total_loads             3020633 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8522938 # total simulation time in cycles
sim_IPC                      1.5379 # instructions per cycle
sim_CPI                      0.6503 # cycles per instruction
sim_exec_BW                  1.5428 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31933344 # cumulative IFQ occupancy
IFQ_fcount                  7835659 # cumulative IFQ full count
ifq_occupancy                3.7468 # avg IFQ occupancy (insn's)
ifq_rate                     1.5428 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4285 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9194 # fraction of time (cycle's) IFQ was full
RUU_count                 132076753 # cumulative RUU occupancy
RUU_fcount                  7154445 # cumulative RUU full count
ruu_occupancy               15.4966 # avg RUU occupancy (insn's)
ruu_rate                     1.5428 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.0444 # avg RUU occupant latency (cycle's)
ruu_full                     0.8394 # fraction of time (cycle's) RUU was full
LSQ_count                  40271183 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7250 # avg LSQ occupancy (insn's)
lsq_rate                     1.5428 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0626 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  189388875 # total number of slip cycles
avg_sim_slip                14.4493 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189454 # total number of accesses
il1.hits                   13189276 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    3667132 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     344575 # total number of hits
ul2.misses                     2277 # total number of misses
ul2.replacements               1253 # total number of replacements
ul2.writebacks                  512 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0066 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0036 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0015 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189454 # total number of accesses
itlb.hits                  13189448 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357314 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 96 3 -redir:sim tempOutput3 sort 

sim: simulation started @ Thu Dec 15 12:07:24 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         96 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12265604 # total number of instructions executed
sim_total_refs              4824018 # total number of loads and stores executed
sim_total_loads             2865540 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196951 # total number of branches executed
sim_cycle                   6836474 # total simulation time in cycles
sim_IPC                      1.6941 # instructions per cycle
sim_CPI                      0.5903 # cycles per instruction
sim_exec_BW                  1.7941 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  20633960 # cumulative IFQ occupancy
IFQ_fcount                  4330341 # cumulative IFQ full count
ifq_occupancy                3.0182 # avg IFQ occupancy (insn's)
ifq_rate                     1.7941 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.6823 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6334 # fraction of time (cycle's) IFQ was full
RUU_count                  84784381 # cumulative RUU occupancy
RUU_fcount                  3728254 # cumulative RUU full count
ruu_occupancy               12.4018 # avg RUU occupancy (insn's)
ruu_rate                     1.7941 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.9124 # avg RUU occupant latency (cycle's)
ruu_full                     0.5453 # fraction of time (cycle's) RUU was full
LSQ_count                  35833408 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.2415 # avg LSQ occupancy (insn's)
lsq_rate                     1.7941 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9215 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  134649849 # total number of slip cycles
avg_sim_slip                11.6263 # the average slip between issue and retirement
bpred_bimod.lookups         3257679 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441951 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435454 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820264 # total number of accesses
il1.hits                   12820047 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497834 # total number of accesses
dl1.hits                    4480758 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      15488 # total number of hits
ul2.misses                    12164 # total number of misses
ul2.replacements              11140 # total number of replacements
ul2.writebacks                 7571 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4399 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.4029 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.2738 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820264 # total number of accesses
itlb.hits                  12820257 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918218 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 96 3 -redir:sim tempOutput3 fft 

sim: simulation started @ Thu Dec 15 12:07:32 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         96 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1024685.2308 # simulation speed (in insts/sec)
sim_total_insn             13375475 # total number of instructions executed
sim_total_refs              6748383 # total number of loads and stores executed
sim_total_loads             3824342 # total number of loads executed
sim_total_stores       2924041.0000 # total number of stores executed
sim_total_branches           390626 # total number of branches executed
sim_cycle                  13561543 # total simulation time in cycles
sim_IPC                      0.9823 # instructions per cycle
sim_CPI                      1.0181 # cycles per instruction
sim_exec_BW                  0.9863 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  53197600 # cumulative IFQ occupancy
IFQ_fcount                 13149688 # cumulative IFQ full count
ifq_occupancy                3.9227 # avg IFQ occupancy (insn's)
ifq_rate                     0.9863 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.9772 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9696 # fraction of time (cycle's) IFQ was full
RUU_count                 213718836 # cumulative RUU occupancy
RUU_fcount                 12999915 # cumulative RUU full count
ruu_occupancy               15.7592 # avg RUU occupancy (insn's)
ruu_rate                     0.9863 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 15.9784 # avg RUU occupant latency (cycle's)
ruu_full                     0.9586 # fraction of time (cycle's) RUU was full
LSQ_count                 113853163 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3953 # avg LSQ occupancy (insn's)
lsq_rate                     0.9863 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  8.5121 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  347287294 # total number of slip cycles
avg_sim_slip                26.0708 # the average slip between issue and retirement
bpred_bimod.lookups          390935 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89876 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400271 # total number of accesses
il1.hits                   13399525 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171929 # total number of accesses
dl1.hits                    5866428 # total number of hits
dl1.misses                   305501 # total number of misses
dl1.replacements             304989 # total number of replacements
dl1.writebacks               156829 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0495 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0494 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 463076 # total number of accesses
ul2.hits                     366978 # total number of hits
ul2.misses                    96098 # total number of misses
ul2.replacements              95074 # total number of replacements
ul2.writebacks                73071 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2075 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2053 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1578 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400271 # total number of accesses
itlb.hits                  13400252 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767088 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 96 3 -redir:sim tempOutput3 filter 

sim: simulation started @ Thu Dec 15 12:07:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         96 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450030 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938901 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12642330 # total simulation time in cycles
sim_IPC                      1.6911 # instructions per cycle
sim_CPI                      0.5913 # cycles per instruction
sim_exec_BW                  1.6967 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49402973 # cumulative IFQ occupancy
IFQ_fcount                 11731570 # cumulative IFQ full count
ifq_occupancy                3.9077 # avg IFQ occupancy (insn's)
ifq_rate                     1.6967 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3032 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9280 # fraction of time (cycle's) IFQ was full
RUU_count                 200758582 # cumulative RUU occupancy
RUU_fcount                 12511357 # cumulative RUU full count
ruu_occupancy               15.8799 # avg RUU occupancy (insn's)
ruu_rate                     1.6967 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3594 # avg RUU occupant latency (cycle's)
ruu_full                     0.9896 # fraction of time (cycle's) RUU was full
LSQ_count                  64365511 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0913 # avg LSQ occupancy (insn's)
lsq_rate                     1.6967 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0007 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292907142 # total number of slip cycles
avg_sim_slip                13.7005 # the average slip between issue and retirement
bpred_bimod.lookups         1654334 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482598 # total number of accesses
il1.hits                   21482419 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943840 # total number of accesses
dl1.hits                    4939894 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       2606 # total number of hits
ul2.misses                     2571 # total number of misses
ul2.replacements               1547 # total number of replacements
ul2.writebacks                  659 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4966 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2988 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1273 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482598 # total number of accesses
itlb.hits                  21482592 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884054 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 96 3 -redir:sim tempOutput3 alphaBlend 

sim: simulation started @ Thu Dec 15 12:07:59 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         96 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861435 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  38696566 # total simulation time in cycles
sim_IPC                      0.7200 # instructions per cycle
sim_CPI                      1.3890 # cycles per instruction
sim_exec_BW                  0.7200 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 154691041 # cumulative IFQ occupancy
IFQ_fcount                 38672520 # cumulative IFQ full count
ifq_occupancy                3.9975 # avg IFQ occupancy (insn's)
ifq_rate                     0.7200 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5522 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 618768054 # cumulative RUU occupancy
RUU_fcount                 38671734 # cumulative RUU full count
ruu_occupancy               15.9903 # avg RUU occupancy (insn's)
ruu_rate                     0.7200 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.2088 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 188021506 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8589 # avg LSQ occupancy (insn's)
lsq_rate                     0.7200 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.7485 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  843298100 # total number of slip cycles
avg_sim_slip                30.2695 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6071212 # total number of hits
dl1.misses                  2582186 # total number of misses
dl1.replacements            2581674 # total number of replacements
dl1.writebacks               958427 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2984 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2983 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1108 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3540824 # total number of accesses
ul2.hits                    3472444 # total number of hits
ul2.misses                    68380 # total number of misses
ul2.replacements              67356 # total number of replacements
ul2.writebacks                22676 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0193 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0190 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0064 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017756 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -redir:sim tempOutput3 matrix 

sim: simulation started @ Thu Dec 15 12:08:22 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13149196 # total number of instructions executed
sim_total_refs              4034192 # total number of loads and stores executed
sim_total_loads             3020633 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8475754 # total simulation time in cycles
sim_IPC                      1.5464 # instructions per cycle
sim_CPI                      0.6467 # cycles per instruction
sim_exec_BW                  1.5514 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31761408 # cumulative IFQ occupancy
IFQ_fcount                  7792675 # cumulative IFQ full count
ifq_occupancy                3.7473 # avg IFQ occupancy (insn's)
ifq_rate                     1.5514 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4155 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9194 # fraction of time (cycle's) IFQ was full
RUU_count                 131387113 # cumulative RUU occupancy
RUU_fcount                  7111485 # cumulative RUU full count
ruu_occupancy               15.5015 # avg RUU occupancy (insn's)
ruu_rate                     1.5514 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9920 # avg RUU occupant latency (cycle's)
ruu_full                     0.8390 # fraction of time (cycle's) RUU was full
LSQ_count                  40040735 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7242 # avg LSQ occupancy (insn's)
lsq_rate                     1.5514 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0451 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  188469315 # total number of slip cycles
avg_sim_slip                14.3792 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189454 # total number of accesses
il1.hits                   13189276 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    3667132 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     344575 # total number of hits
ul2.misses                     2277 # total number of misses
ul2.replacements               1253 # total number of replacements
ul2.writebacks                  512 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0066 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0036 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0015 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189454 # total number of accesses
itlb.hits                  13189448 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357314 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -redir:sim tempOutput3 sort 

sim: simulation started @ Thu Dec 15 12:08:31 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12265316 # total number of instructions executed
sim_total_refs              4824018 # total number of loads and stores executed
sim_total_loads             2865540 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196951 # total number of branches executed
sim_cycle                   6671951 # total simulation time in cycles
sim_IPC                      1.7359 # instructions per cycle
sim_CPI                      0.5761 # cycles per instruction
sim_exec_BW                  1.8383 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19996318 # cumulative IFQ occupancy
IFQ_fcount                  4170930 # cumulative IFQ full count
ifq_occupancy                2.9971 # avg IFQ occupancy (insn's)
ifq_rate                     1.8383 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.6303 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6251 # fraction of time (cycle's) IFQ was full
RUU_count                  82232887 # cumulative RUU occupancy
RUU_fcount                  3568913 # cumulative RUU full count
ruu_occupancy               12.3252 # avg RUU occupancy (insn's)
ruu_rate                     1.8383 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.7045 # avg RUU occupant latency (cycle's)
ruu_full                     0.5349 # fraction of time (cycle's) RUU was full
LSQ_count                  34770829 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.2115 # avg LSQ occupancy (insn's)
lsq_rate                     1.8383 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.8349 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  131038752 # total number of slip cycles
avg_sim_slip                11.3145 # the average slip between issue and retirement
bpred_bimod.lookups         3257679 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441951 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435454 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820264 # total number of accesses
il1.hits                   12820047 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497834 # total number of accesses
dl1.hits                    4480758 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      15488 # total number of hits
ul2.misses                    12164 # total number of misses
ul2.replacements              11140 # total number of replacements
ul2.writebacks                 7571 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4399 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.4029 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.2738 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820264 # total number of accesses
itlb.hits                  12820257 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918218 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -redir:sim tempOutput3 fft 

sim: simulation started @ Thu Dec 15 12:08:39 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375286 # total number of instructions executed
sim_total_refs              6748385 # total number of loads and stores executed
sim_total_loads             3824342 # total number of loads executed
sim_total_stores       2924043.0000 # total number of stores executed
sim_total_branches           390626 # total number of branches executed
sim_cycle                  12158647 # total simulation time in cycles
sim_IPC                      1.0956 # instructions per cycle
sim_CPI                      0.9127 # cycles per instruction
sim_exec_BW                  1.1001 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  47656002 # cumulative IFQ occupancy
IFQ_fcount                 11764288 # cumulative IFQ full count
ifq_occupancy                3.9195 # avg IFQ occupancy (insn's)
ifq_rate                     1.1001 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.5630 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9676 # fraction of time (cycle's) IFQ was full
RUU_count                 191542911 # cumulative RUU occupancy
RUU_fcount                 11614561 # cumulative RUU full count
ruu_occupancy               15.7536 # avg RUU occupancy (insn's)
ruu_rate                     1.1001 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 14.3207 # avg RUU occupant latency (cycle's)
ruu_full                     0.9553 # fraction of time (cycle's) RUU was full
LSQ_count                 101438249 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3429 # avg LSQ occupancy (insn's)
lsq_rate                     1.1001 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.5840 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  312696756 # total number of slip cycles
avg_sim_slip                23.4741 # the average slip between issue and retirement
bpred_bimod.lookups          390935 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89876 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400274 # total number of accesses
il1.hits                   13399528 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171929 # total number of accesses
dl1.hits                    5866428 # total number of hits
dl1.misses                   305501 # total number of misses
dl1.replacements             304989 # total number of replacements
dl1.writebacks               156829 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0495 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0494 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 463076 # total number of accesses
ul2.hits                     366978 # total number of hits
ul2.misses                    96098 # total number of misses
ul2.replacements              95074 # total number of replacements
ul2.writebacks                73071 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2075 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2053 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1578 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400274 # total number of accesses
itlb.hits                  13400255 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767100 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -redir:sim tempOutput3 filter 

sim: simulation started @ Thu Dec 15 12:08:51 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449934 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938901 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12586530 # total simulation time in cycles
sim_IPC                      1.6986 # instructions per cycle
sim_CPI                      0.5887 # cycles per instruction
sim_exec_BW                  1.7042 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49196477 # cumulative IFQ occupancy
IFQ_fcount                 11679946 # cumulative IFQ full count
ifq_occupancy                3.9087 # avg IFQ occupancy (insn's)
ifq_rate                     1.7042 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2935 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9280 # fraction of time (cycle's) IFQ was full
RUU_count                 199931470 # cumulative RUU occupancy
RUU_fcount                 12459757 # cumulative RUU full count
ruu_occupancy               15.8846 # avg RUU occupancy (insn's)
ruu_rate                     1.7042 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3208 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64104343 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0931 # avg LSQ occupancy (insn's)
lsq_rate                     1.7042 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9886 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291819102 # total number of slip cycles
avg_sim_slip                13.6496 # the average slip between issue and retirement
bpred_bimod.lookups         1654334 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482598 # total number of accesses
il1.hits                   21482419 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943840 # total number of accesses
dl1.hits                    4939894 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       2606 # total number of hits
ul2.misses                     2571 # total number of misses
ul2.replacements               1547 # total number of replacements
ul2.writebacks                  659 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4966 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2988 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1273 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482598 # total number of accesses
itlb.hits                  21482592 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884054 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -redir:sim tempOutput3 alphaBlend 

sim: simulation started @ Thu Dec 15 12:09:05 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861243 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37608622 # total simulation time in cycles
sim_IPC                      0.7408 # instructions per cycle
sim_CPI                      1.3499 # cycles per instruction
sim_exec_BW                  0.7408 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150359329 # cumulative IFQ occupancy
IFQ_fcount                 37589592 # cumulative IFQ full count
ifq_occupancy                3.9980 # avg IFQ occupancy (insn's)
ifq_rate                     0.7408 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3967 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 601440174 # cumulative RUU occupancy
RUU_fcount                 37588854 # cumulative RUU full count
ruu_occupancy               15.9921 # avg RUU occupancy (insn's)
ruu_rate                     0.7408 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.5870 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 182786698 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8602 # avg LSQ occupancy (insn's)
lsq_rate                     0.7408 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5606 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  820735508 # total number of slip cycles
avg_sim_slip                29.4596 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6071212 # total number of hits
dl1.misses                  2582186 # total number of misses
dl1.replacements            2581674 # total number of replacements
dl1.writebacks               958427 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2984 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2983 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1108 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3540824 # total number of accesses
ul2.hits                    3472444 # total number of hits
ul2.misses                    68380 # total number of misses
ul2.replacements              67356 # total number of replacements
ul2.writebacks                22676 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0193 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0190 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0064 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017756 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 48 3 -redir:sim tempOutput3 matrix 

sim: simulation started @ Thu Dec 15 12:09:28 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         48 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13149100 # total number of instructions executed
sim_total_refs              4034192 # total number of loads and stores executed
sim_total_loads             3020633 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8428570 # total simulation time in cycles
sim_IPC                      1.5551 # instructions per cycle
sim_CPI                      0.6431 # cycles per instruction
sim_exec_BW                  1.5601 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31589472 # cumulative IFQ occupancy
IFQ_fcount                  7749691 # cumulative IFQ full count
ifq_occupancy                3.7479 # avg IFQ occupancy (insn's)
ifq_rate                     1.5601 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4024 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9195 # fraction of time (cycle's) IFQ was full
RUU_count                 130697473 # cumulative RUU occupancy
RUU_fcount                  7068525 # cumulative RUU full count
ruu_occupancy               15.5065 # avg RUU occupancy (insn's)
ruu_rate                     1.5601 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9397 # avg RUU occupant latency (cycle's)
ruu_full                     0.8386 # fraction of time (cycle's) RUU was full
LSQ_count                  39810287 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7233 # avg LSQ occupancy (insn's)
lsq_rate                     1.5601 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0276 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  187549755 # total number of slip cycles
avg_sim_slip                14.3090 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189454 # total number of accesses
il1.hits                   13189276 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    3667132 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     344575 # total number of hits
ul2.misses                     2277 # total number of misses
ul2.replacements               1253 # total number of replacements
ul2.writebacks                  512 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0066 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0036 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0015 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189454 # total number of accesses
itlb.hits                  13189448 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357314 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 48 3 -redir:sim tempOutput3 sort 

sim: simulation started @ Thu Dec 15 12:09:37 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         48 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12265028 # total number of instructions executed
sim_total_refs              4824018 # total number of loads and stores executed
sim_total_loads             2865540 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196951 # total number of branches executed
sim_cycle                   6507431 # total simulation time in cycles
sim_IPC                      1.7797 # instructions per cycle
sim_CPI                      0.5619 # cycles per instruction
sim_exec_BW                  1.8848 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19358686 # cumulative IFQ occupancy
IFQ_fcount                  4011522 # cumulative IFQ full count
ifq_occupancy                2.9749 # avg IFQ occupancy (insn's)
ifq_rate                     1.8848 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5784 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6165 # fraction of time (cycle's) IFQ was full
RUU_count                  79681591 # cumulative RUU occupancy
RUU_fcount                  3409577 # cumulative RUU full count
ruu_occupancy               12.2447 # avg RUU occupancy (insn's)
ruu_rate                     1.8848 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4966 # avg RUU occupant latency (cycle's)
ruu_full                     0.5240 # fraction of time (cycle's) RUU was full
LSQ_count                  33708349 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1800 # avg LSQ occupancy (insn's)
lsq_rate                     1.8848 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.7483 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  127427952 # total number of slip cycles
avg_sim_slip                11.0027 # the average slip between issue and retirement
bpred_bimod.lookups         3257679 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441951 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435454 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820264 # total number of accesses
il1.hits                   12820047 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497834 # total number of accesses
dl1.hits                    4480758 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      15488 # total number of hits
ul2.misses                    12164 # total number of misses
ul2.replacements              11140 # total number of replacements
ul2.writebacks                 7571 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4399 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.4029 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.2738 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820264 # total number of accesses
itlb.hits                  12820257 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918218 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 48 3 -redir:sim tempOutput3 fft 

sim: simulation started @ Thu Dec 15 12:09:44 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         48 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375094 # total number of instructions executed
sim_total_refs              6748385 # total number of loads and stores executed
sim_total_loads             3824342 # total number of loads executed
sim_total_stores       2924043.0000 # total number of stores executed
sim_total_branches           390626 # total number of branches executed
sim_cycle                  10756468 # total simulation time in cycles
sim_IPC                      1.2384 # instructions per cycle
sim_CPI                      0.8075 # cycles per instruction
sim_exec_BW                  1.2434 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  42117275 # cumulative IFQ occupancy
IFQ_fcount                 10379606 # cumulative IFQ full count
ifq_occupancy                3.9155 # avg IFQ occupancy (insn's)
ifq_rate                     1.2434 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.1489 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9650 # fraction of time (cycle's) IFQ was full
RUU_count                 169378478 # cumulative RUU occupancy
RUU_fcount                 10229902 # cumulative RUU full count
ruu_occupancy               15.7467 # avg RUU occupancy (insn's)
ruu_rate                     1.2434 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 12.6637 # avg RUU occupant latency (cycle's)
ruu_full                     0.9510 # fraction of time (cycle's) RUU was full
LSQ_count                  89030324 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2769 # avg LSQ occupancy (insn's)
lsq_rate                     1.2434 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6564 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  278124710 # total number of slip cycles
avg_sim_slip                20.8788 # the average slip between issue and retirement
bpred_bimod.lookups          390935 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89876 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400274 # total number of accesses
il1.hits                   13399528 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171929 # total number of accesses
dl1.hits                    5866424 # total number of hits
dl1.misses                   305505 # total number of misses
dl1.replacements             304993 # total number of replacements
dl1.writebacks               156829 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0495 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0494 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 463080 # total number of accesses
ul2.hits                     366978 # total number of hits
ul2.misses                    96102 # total number of misses
ul2.replacements              95078 # total number of replacements
ul2.writebacks                73071 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2075 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2053 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1578 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400274 # total number of accesses
itlb.hits                  13400255 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736823 # total number of hits
dtlb.misses                    4175 # total number of misses
dtlb.replacements              4047 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767100 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 48 3 -redir:sim tempOutput3 filter 

sim: simulation started @ Thu Dec 15 12:09:56 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         48 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449838 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938901 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12530730 # total simulation time in cycles
sim_IPC                      1.7061 # instructions per cycle
sim_CPI                      0.5861 # cycles per instruction
sim_exec_BW                  1.7118 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48989981 # cumulative IFQ occupancy
IFQ_fcount                 11628322 # cumulative IFQ full count
ifq_occupancy                3.9096 # avg IFQ occupancy (insn's)
ifq_rate                     1.7118 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2839 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9280 # fraction of time (cycle's) IFQ was full
RUU_count                 199104358 # cumulative RUU occupancy
RUU_fcount                 12408157 # cumulative RUU full count
ruu_occupancy               15.8893 # avg RUU occupancy (insn's)
ruu_rate                     1.7118 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2823 # avg RUU occupant latency (cycle's)
ruu_full                     0.9902 # fraction of time (cycle's) RUU was full
LSQ_count                  63843175 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0949 # avg LSQ occupancy (insn's)
lsq_rate                     1.7118 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9764 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290731062 # total number of slip cycles
avg_sim_slip                13.5987 # the average slip between issue and retirement
bpred_bimod.lookups         1654334 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482598 # total number of accesses
il1.hits                   21482419 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943840 # total number of accesses
dl1.hits                    4939894 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       2606 # total number of hits
ul2.misses                     2571 # total number of misses
ul2.replacements               1547 # total number of replacements
ul2.writebacks                  659 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4966 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2988 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1273 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482598 # total number of accesses
itlb.hits                  21482592 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884054 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 48 3 -redir:sim tempOutput3 alphaBlend 

sim: simulation started @ Thu Dec 15 12:10:09 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         48 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861051 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  36520678 # total simulation time in cycles
sim_IPC                      0.7628 # instructions per cycle
sim_CPI                      1.3109 # cycles per instruction
sim_exec_BW                  0.7629 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 146027617 # cumulative IFQ occupancy
IFQ_fcount                 36506664 # cumulative IFQ full count
ifq_occupancy                3.9985 # avg IFQ occupancy (insn's)
ifq_rate                     0.7629 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2413 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 584112294 # cumulative RUU occupancy
RUU_fcount                 36505974 # cumulative RUU full count
ruu_occupancy               15.9940 # avg RUU occupancy (insn's)
ruu_rate                     0.7629 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.9652 # avg RUU occupant latency (cycle's)
ruu_full                     0.9996 # fraction of time (cycle's) RUU was full
LSQ_count                 177551890 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8617 # avg LSQ occupancy (insn's)
lsq_rate                     0.7629 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3728 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  798172916 # total number of slip cycles
avg_sim_slip                28.6497 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6071212 # total number of hits
dl1.misses                  2582186 # total number of misses
dl1.replacements            2581674 # total number of replacements
dl1.writebacks               958427 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2984 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2983 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1108 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3540824 # total number of accesses
ul2.hits                    3472444 # total number of hits
ul2.misses                    68380 # total number of misses
ul2.replacements              67356 # total number of replacements
ul2.writebacks                22676 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0193 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0190 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0064 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017756 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 24 3 -redir:sim tempOutput3 matrix 

sim: simulation started @ Thu Dec 15 12:10:32 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         24 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13149004 # total number of instructions executed
sim_total_refs              4034192 # total number of loads and stores executed
sim_total_loads             3020633 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8381386 # total simulation time in cycles
sim_IPC                      1.5638 # instructions per cycle
sim_CPI                      0.6395 # cycles per instruction
sim_exec_BW                  1.5688 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31417536 # cumulative IFQ occupancy
IFQ_fcount                  7706707 # cumulative IFQ full count
ifq_occupancy                3.7485 # avg IFQ occupancy (insn's)
ifq_rate                     1.5688 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3893 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9195 # fraction of time (cycle's) IFQ was full
RUU_count                 130007833 # cumulative RUU occupancy
RUU_fcount                  7025565 # cumulative RUU full count
ruu_occupancy               15.5115 # avg RUU occupancy (insn's)
ruu_rate                     1.5688 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.8873 # avg RUU occupant latency (cycle's)
ruu_full                     0.8382 # fraction of time (cycle's) RUU was full
LSQ_count                  39579839 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7224 # avg LSQ occupancy (insn's)
lsq_rate                     1.5688 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0101 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  186630195 # total number of slip cycles
avg_sim_slip                14.2388 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189454 # total number of accesses
il1.hits                   13189276 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    3667132 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     344575 # total number of hits
ul2.misses                     2277 # total number of misses
ul2.replacements               1253 # total number of replacements
ul2.writebacks                  512 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0066 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0036 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0015 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189454 # total number of accesses
itlb.hits                  13189448 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357314 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 24 3 -redir:sim tempOutput3 sort 

sim: simulation started @ Thu Dec 15 12:10:40 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         24 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264740 # total number of instructions executed
sim_total_refs              4824018 # total number of loads and stores executed
sim_total_loads             2865540 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196951 # total number of branches executed
sim_cycle                   6342919 # total simulation time in cycles
sim_IPC                      1.8259 # instructions per cycle
sim_CPI                      0.5477 # cycles per instruction
sim_exec_BW                  1.9336 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18721060 # cumulative IFQ occupancy
IFQ_fcount                  3852116 # cumulative IFQ full count
ifq_occupancy                2.9515 # avg IFQ occupancy (insn's)
ifq_rate                     1.9336 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5264 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6073 # fraction of time (cycle's) IFQ was full
RUU_count                  77130314 # cumulative RUU occupancy
RUU_fcount                  3250245 # cumulative RUU full count
ruu_occupancy               12.1601 # avg RUU occupancy (insn's)
ruu_rate                     1.9336 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2888 # avg RUU occupant latency (cycle's)
ruu_full                     0.5124 # fraction of time (cycle's) RUU was full
LSQ_count                  32645879 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1468 # avg LSQ occupancy (insn's)
lsq_rate                     1.9336 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6618 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123817178 # total number of slip cycles
avg_sim_slip                10.6909 # the average slip between issue and retirement
bpred_bimod.lookups         3257679 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441951 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435454 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820265 # total number of accesses
il1.hits                   12820048 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497834 # total number of accesses
dl1.hits                    4480758 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      15488 # total number of hits
ul2.misses                    12164 # total number of misses
ul2.replacements              11140 # total number of replacements
ul2.writebacks                 7571 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4399 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.4029 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.2738 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820265 # total number of accesses
itlb.hits                  12820258 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918222 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 24 3 -redir:sim tempOutput3 fft 

sim: simulation started @ Thu Dec 15 12:10:48 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         24 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13374908 # total number of instructions executed
sim_total_refs              6748388 # total number of loads and stores executed
sim_total_loads             3824345 # total number of loads executed
sim_total_stores       2924043.0000 # total number of stores executed
sim_total_branches           390626 # total number of branches executed
sim_cycle                   9355441 # total simulation time in cycles
sim_IPC                      1.4239 # instructions per cycle
sim_CPI                      0.7023 # cycles per instruction
sim_exec_BW                  1.4296 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36583133 # cumulative IFQ occupancy
IFQ_fcount                  8996068 # cumulative IFQ full count
ifq_occupancy                3.9104 # avg IFQ occupancy (insn's)
ifq_rate                     1.4296 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7352 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9616 # fraction of time (cycle's) IFQ was full
RUU_count                 147232367 # cumulative RUU occupancy
RUU_fcount                  8846433 # cumulative RUU full count
ruu_occupancy               15.7376 # avg RUU occupancy (insn's)
ruu_rate                     1.4296 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.0081 # avg RUU occupant latency (cycle's)
ruu_full                     0.9456 # fraction of time (cycle's) RUU was full
LSQ_count                  76635370 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1915 # avg LSQ occupancy (insn's)
lsq_rate                     1.4296 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7298 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  243583934 # total number of slip cycles
avg_sim_slip                18.2858 # the average slip between issue and retirement
bpred_bimod.lookups          390935 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89876 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400277 # total number of accesses
il1.hits                   13399531 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171925 # total number of accesses
dl1.hits                    5866420 # total number of hits
dl1.misses                   305505 # total number of misses
dl1.replacements             304993 # total number of replacements
dl1.writebacks               156829 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0495 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0494 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 463080 # total number of accesses
ul2.hits                     366980 # total number of hits
ul2.misses                    96100 # total number of misses
ul2.replacements              95076 # total number of replacements
ul2.writebacks                73071 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2075 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2053 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1578 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400277 # total number of accesses
itlb.hits                  13400258 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736823 # total number of hits
dtlb.misses                    4175 # total number of misses
dtlb.replacements              4047 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767124 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 24 3 -redir:sim tempOutput3 filter 

sim: simulation started @ Thu Dec 15 12:10:59 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         24 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449742 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938901 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12474933 # total simulation time in cycles
sim_IPC                      1.7138 # instructions per cycle
sim_CPI                      0.5835 # cycles per instruction
sim_exec_BW                  1.7194 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48783485 # cumulative IFQ occupancy
IFQ_fcount                 11576698 # cumulative IFQ full count
ifq_occupancy                3.9105 # avg IFQ occupancy (insn's)
ifq_rate                     1.7194 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2743 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9280 # fraction of time (cycle's) IFQ was full
RUU_count                 198277246 # cumulative RUU occupancy
RUU_fcount                 12356557 # cumulative RUU full count
ruu_occupancy               15.8941 # avg RUU occupancy (insn's)
ruu_rate                     1.7194 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2438 # avg RUU occupant latency (cycle's)
ruu_full                     0.9905 # fraction of time (cycle's) RUU was full
LSQ_count                  63582007 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0968 # avg LSQ occupancy (insn's)
lsq_rate                     1.7194 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9642 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  289643022 # total number of slip cycles
avg_sim_slip                13.5479 # the average slip between issue and retirement
bpred_bimod.lookups         1654334 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482597 # total number of accesses
il1.hits                   21482418 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943840 # total number of accesses
dl1.hits                    4939894 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       2606 # total number of hits
ul2.misses                     2571 # total number of misses
ul2.replacements               1547 # total number of replacements
ul2.writebacks                  659 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4966 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2988 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1273 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482597 # total number of accesses
itlb.hits                  21482591 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884050 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 24 3 -redir:sim tempOutput3 alphaBlend 

sim: simulation started @ Thu Dec 15 12:11:12 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         24 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27860859 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  35440243 # total simulation time in cycles
sim_IPC                      0.7861 # instructions per cycle
sim_CPI                      1.2721 # cycles per instruction
sim_exec_BW                  0.7861 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 141725913 # cumulative IFQ occupancy
IFQ_fcount                 35431238 # cumulative IFQ full count
ifq_occupancy                3.9990 # avg IFQ occupancy (insn's)
ifq_rate                     0.7861 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.0869 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9997 # fraction of time (cycle's) IFQ was full
RUU_count                 566904451 # cumulative RUU occupancy
RUU_fcount                 35430596 # cumulative RUU full count
ruu_occupancy               15.9961 # avg RUU occupancy (insn's)
ruu_rate                     0.7861 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.3477 # avg RUU occupant latency (cycle's)
ruu_full                     0.9997 # fraction of time (cycle's) RUU was full
LSQ_count                 172347094 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8630 # avg LSQ occupancy (insn's)
lsq_rate                     0.7861 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.1860 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  775760373 # total number of slip cycles
avg_sim_slip                27.8453 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860788 # total number of accesses
il1.hits                   27860577 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6071212 # total number of hits
dl1.misses                  2582186 # total number of misses
dl1.replacements            2581674 # total number of replacements
dl1.writebacks               958427 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2984 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2983 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1108 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3540824 # total number of accesses
ul2.hits                    3472444 # total number of hits
ul2.misses                    68380 # total number of misses
ul2.replacements              67356 # total number of replacements
ul2.writebacks                22676 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0193 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0190 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0064 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860788 # total number of accesses
itlb.hits                  27860782 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017752 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 12 3 -redir:sim tempOutput3 matrix 

sim: simulation started @ Thu Dec 15 12:11:35 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         12 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148954 # total number of instructions executed
sim_total_refs              4034191 # total number of loads and stores executed
sim_total_loads             3020632 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8358593 # total simulation time in cycles
sim_IPC                      1.5681 # instructions per cycle
sim_CPI                      0.6377 # cycles per instruction
sim_exec_BW                  1.5731 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31332431 # cumulative IFQ occupancy
IFQ_fcount                  7685252 # cumulative IFQ full count
ifq_occupancy                3.7485 # avg IFQ occupancy (insn's)
ifq_rate                     1.5731 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3829 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9194 # fraction of time (cycle's) IFQ was full
RUU_count                 129673165 # cumulative RUU occupancy
RUU_fcount                  7004275 # cumulative RUU full count
ruu_occupancy               15.5138 # avg RUU occupancy (insn's)
ruu_rate                     1.5731 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.8619 # avg RUU occupant latency (cycle's)
ruu_full                     0.8380 # fraction of time (cycle's) RUU was full
LSQ_count                  39466146 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7216 # avg LSQ occupancy (insn's)
lsq_rate                     1.5731 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0015 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  186182010 # total number of slip cycles
avg_sim_slip                14.2046 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189450 # total number of accesses
il1.hits                   13189272 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013172 # total number of accesses
dl1.hits                    3667130 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     344575 # total number of hits
ul2.misses                     2277 # total number of misses
ul2.replacements               1253 # total number of replacements
ul2.writebacks                  512 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0066 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0036 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0015 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189450 # total number of accesses
itlb.hits                  13189444 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013776 # total number of accesses
dtlb.hits                   4013740 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357296 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 12 3 -redir:sim tempOutput3 sort 

sim: simulation started @ Thu Dec 15 12:11:44 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         12 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264599 # total number of instructions executed
sim_total_refs              4824020 # total number of loads and stores executed
sim_total_loads             2865542 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196951 # total number of branches executed
sim_cycle                   6261117 # total simulation time in cycles
sim_IPC                      1.8498 # instructions per cycle
sim_CPI                      0.5406 # cycles per instruction
sim_exec_BW                  1.9589 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18403779 # cumulative IFQ occupancy
IFQ_fcount                  3772795 # cumulative IFQ full count
ifq_occupancy                2.9394 # avg IFQ occupancy (insn's)
ifq_rate                     1.9589 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5006 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6026 # fraction of time (cycle's) IFQ was full
RUU_count                  75860969 # cumulative RUU occupancy
RUU_fcount                  3170962 # cumulative RUU full count
ruu_occupancy               12.1162 # avg RUU occupancy (insn's)
ruu_rate                     1.9589 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.1854 # avg RUU occupant latency (cycle's)
ruu_full                     0.5065 # fraction of time (cycle's) RUU was full
LSQ_count                  32116203 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1295 # avg LSQ occupancy (insn's)
lsq_rate                     1.9589 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6186 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122019619 # total number of slip cycles
avg_sim_slip                10.5357 # the average slip between issue and retirement
bpred_bimod.lookups         3257678 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441951 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435453 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820268 # total number of accesses
il1.hits                   12820051 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497835 # total number of accesses
dl1.hits                    4480759 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      15488 # total number of hits
ul2.misses                    12164 # total number of misses
ul2.replacements              11140 # total number of replacements
ul2.writebacks                 7571 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4399 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.4029 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.2738 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820268 # total number of accesses
itlb.hits                  12820261 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918238 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 12 3 -redir:sim tempOutput3 fft 

sim: simulation started @ Thu Dec 15 12:11:52 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         12 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13374819 # total number of instructions executed
sim_total_refs              6748384 # total number of loads and stores executed
sim_total_loads             3824341 # total number of loads executed
sim_total_stores       2924043.0000 # total number of stores executed
sim_total_branches           390627 # total number of branches executed
sim_cycle                   8665030 # total simulation time in cycles
sim_IPC                      1.5373 # instructions per cycle
sim_CPI                      0.6505 # cycles per instruction
sim_exec_BW                  1.5435 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  33855743 # cumulative IFQ occupancy
IFQ_fcount                  8314198 # cumulative IFQ full count
ifq_occupancy                3.9072 # avg IFQ occupancy (insn's)
ifq_rate                     1.5435 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.5313 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9595 # fraction of time (cycle's) IFQ was full
RUU_count                 136319062 # cumulative RUU occupancy
RUU_fcount                  8164430 # cumulative RUU full count
ruu_occupancy               15.7321 # avg RUU occupancy (insn's)
ruu_rate                     1.5435 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.1922 # avg RUU occupant latency (cycle's)
ruu_full                     0.9422 # fraction of time (cycle's) RUU was full
LSQ_count                  70522205 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1387 # avg LSQ occupancy (insn's)
lsq_rate                     1.5435 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.2728 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  226557546 # total number of slip cycles
avg_sim_slip                17.0077 # the average slip between issue and retirement
bpred_bimod.lookups          390936 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89876 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400268 # total number of accesses
il1.hits                   13399522 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171874 # total number of accesses
dl1.hits                    5866374 # total number of hits
dl1.misses                   305500 # total number of misses
dl1.replacements             304988 # total number of replacements
dl1.writebacks               156828 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0495 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0494 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 463074 # total number of accesses
ul2.hits                     366978 # total number of hits
ul2.misses                    96096 # total number of misses
ul2.replacements              95072 # total number of replacements
ul2.writebacks                73071 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2075 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2053 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1578 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400268 # total number of accesses
itlb.hits                  13400249 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736789 # total number of hits
dtlb.misses                    4209 # total number of misses
dtlb.replacements              4081 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767076 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 12 3 -redir:sim tempOutput3 filter 

sim: simulation started @ Thu Dec 15 12:12:03 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         12 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449699 # total number of instructions executed
sim_total_refs              6588960 # total number of loads and stores executed
sim_total_loads             4938904 # total number of loads executed
sim_total_stores       1650056.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12447376 # total simulation time in cycles
sim_IPC                      1.7176 # instructions per cycle
sim_CPI                      0.5822 # cycles per instruction
sim_exec_BW                  1.7232 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48681354 # cumulative IFQ occupancy
IFQ_fcount                 11551165 # cumulative IFQ full count
ifq_occupancy                3.9110 # avg IFQ occupancy (insn's)
ifq_rate                     1.7232 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2696 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9280 # fraction of time (cycle's) IFQ was full
RUU_count                 197868284 # cumulative RUU occupancy
RUU_fcount                 12331035 # cumulative RUU full count
ruu_occupancy               15.8964 # avg RUU occupancy (insn's)
ruu_rate                     1.7232 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2248 # avg RUU occupant latency (cycle's)
ruu_full                     0.9907 # fraction of time (cycle's) RUU was full
LSQ_count                  63452863 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0977 # avg LSQ occupancy (insn's)
lsq_rate                     1.7232 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9582 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  289105026 # total number of slip cycles
avg_sim_slip                13.5227 # the average slip between issue and retirement
bpred_bimod.lookups         1654336 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           57 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482603 # total number of accesses
il1.hits                   21482424 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943840 # total number of accesses
dl1.hits                    4939894 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       2606 # total number of hits
ul2.misses                     2571 # total number of misses
ul2.replacements               1547 # total number of replacements
ul2.writebacks                  659 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4966 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2988 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1273 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482603 # total number of accesses
itlb.hits                  21482597 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884080 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl2 ul2:256:64:4:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 12 3 -redir:sim tempOutput3 alphaBlend 

sim: simulation started @ Thu Dec 15 12:12:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         12 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27860763 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  34909913 # total simulation time in cycles
sim_IPC                      0.7980 # instructions per cycle
sim_CPI                      1.2531 # cycles per instruction
sim_exec_BW                  0.7981 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 139614343 # cumulative IFQ occupancy
IFQ_fcount                 34903345 # cumulative IFQ full count
ifq_occupancy                3.9993 # avg IFQ occupancy (insn's)
ifq_rate                     0.7981 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.0111 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9998 # fraction of time (cycle's) IFQ was full
RUU_count                 558457917 # cumulative RUU occupancy
RUU_fcount                 34902728 # cumulative RUU full count
ruu_occupancy               15.9971 # avg RUU occupancy (insn's)
ruu_rate                     0.7981 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.0446 # avg RUU occupant latency (cycle's)
ruu_full                     0.9998 # fraction of time (cycle's) RUU was full
LSQ_count                 169818712 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8645 # avg LSQ occupancy (insn's)
lsq_rate                     0.7981 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.0953 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  764785505 # total number of slip cycles
avg_sim_slip                27.4513 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860786 # total number of accesses
il1.hits                   27860575 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6074905 # total number of hits
dl1.misses                  2578493 # total number of misses
dl1.replacements            2577981 # total number of replacements
dl1.writebacks               958398 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2980 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2979 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1108 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3537102 # total number of accesses
ul2.hits                    3468751 # total number of hits
ul2.misses                    68351 # total number of misses
ul2.replacements              67327 # total number of replacements
ul2.writebacks                22676 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0193 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0190 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0064 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860786 # total number of accesses
itlb.hits                  27860780 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017744 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

