sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:19:30 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13149184 # total number of instructions executed
sim_total_refs              4034192 # total number of loads and stores executed
sim_total_loads             3020633 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8469856 # total simulation time in cycles
sim_IPC                      1.5475 # instructions per cycle
sim_CPI                      0.6462 # cycles per instruction
sim_exec_BW                  1.5525 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31739916 # cumulative IFQ occupancy
IFQ_fcount                  7787302 # cumulative IFQ full count
ifq_occupancy                3.7474 # avg IFQ occupancy (insn's)
ifq_rate                     1.5525 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4138 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9194 # fraction of time (cycle's) IFQ was full
RUU_count                 131300908 # cumulative RUU occupancy
RUU_fcount                  7106115 # cumulative RUU full count
ruu_occupancy               15.5021 # avg RUU occupancy (insn's)
ruu_rate                     1.5525 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9855 # avg RUU occupant latency (cycle's)
ruu_full                     0.8390 # fraction of time (cycle's) RUU was full
LSQ_count                  40011929 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7240 # avg LSQ occupancy (insn's)
lsq_rate                     1.5525 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0429 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  188354370 # total number of slip cycles
avg_sim_slip                14.3704 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189454 # total number of accesses
il1.hits                   13189276 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    3667132 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     344575 # total number of hits
ul2.misses                     2277 # total number of misses
ul2.replacements               1253 # total number of replacements
ul2.writebacks                  512 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0066 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0036 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0015 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189454 # total number of accesses
itlb.hits                  13189448 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357314 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:19:39 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12265280 # total number of instructions executed
sim_total_refs              4824018 # total number of loads and stores executed
sim_total_loads             2865540 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196951 # total number of branches executed
sim_cycle                   6651386 # total simulation time in cycles
sim_IPC                      1.7412 # instructions per cycle
sim_CPI                      0.5743 # cycles per instruction
sim_exec_BW                  1.8440 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19916614 # cumulative IFQ occupancy
IFQ_fcount                  4151004 # cumulative IFQ full count
ifq_occupancy                2.9944 # avg IFQ occupancy (insn's)
ifq_rate                     1.8440 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.6238 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6241 # fraction of time (cycle's) IFQ was full
RUU_count                  81913975 # cumulative RUU occupancy
RUU_fcount                  3548996 # cumulative RUU full count
ruu_occupancy               12.3153 # avg RUU occupancy (insn's)
ruu_rate                     1.8440 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.6785 # avg RUU occupant latency (cycle's)
ruu_full                     0.5336 # fraction of time (cycle's) RUU was full
LSQ_count                  34638019 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.2076 # avg LSQ occupancy (insn's)
lsq_rate                     1.8440 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.8241 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  130587402 # total number of slip cycles
avg_sim_slip                11.2755 # the average slip between issue and retirement
bpred_bimod.lookups         3257679 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441951 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435454 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820264 # total number of accesses
il1.hits                   12820047 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497834 # total number of accesses
dl1.hits                    4480758 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      15488 # total number of hits
ul2.misses                    12164 # total number of misses
ul2.replacements              11140 # total number of replacements
ul2.writebacks                 7571 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4399 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.4029 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.2738 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820264 # total number of accesses
itlb.hits                  12820257 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918218 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:19:47 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375262 # total number of instructions executed
sim_total_refs              6748385 # total number of loads and stores executed
sim_total_loads             3824342 # total number of loads executed
sim_total_stores       2924043.0000 # total number of stores executed
sim_total_branches           390626 # total number of branches executed
sim_cycle                  11983299 # total simulation time in cycles
sim_IPC                      1.1116 # instructions per cycle
sim_CPI                      0.8996 # cycles per instruction
sim_exec_BW                  1.1162 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  46963358 # cumulative IFQ occupancy
IFQ_fcount                 11591127 # cumulative IFQ full count
ifq_occupancy                3.9191 # avg IFQ occupancy (insn's)
ifq_rate                     1.1162 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.5112 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9673 # fraction of time (cycle's) IFQ was full
RUU_count                 188771162 # cumulative RUU occupancy
RUU_fcount                 11441406 # cumulative RUU full count
ruu_occupancy               15.7529 # avg RUU occupancy (insn's)
ruu_rate                     1.1162 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 14.1135 # avg RUU occupant latency (cycle's)
ruu_full                     0.9548 # fraction of time (cycle's) RUU was full
LSQ_count                  99886534 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3355 # avg LSQ occupancy (insn's)
lsq_rate                     1.1162 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.4680 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  308373331 # total number of slip cycles
avg_sim_slip                23.1496 # the average slip between issue and retirement
bpred_bimod.lookups          390935 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89876 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400274 # total number of accesses
il1.hits                   13399528 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171929 # total number of accesses
dl1.hits                    5866428 # total number of hits
dl1.misses                   305501 # total number of misses
dl1.replacements             304989 # total number of replacements
dl1.writebacks               156829 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0495 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0494 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 463076 # total number of accesses
ul2.hits                     366978 # total number of hits
ul2.misses                    96098 # total number of misses
ul2.replacements              95074 # total number of replacements
ul2.writebacks                73071 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2075 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2053 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1578 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400274 # total number of accesses
itlb.hits                  13400255 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767100 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:19:59 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449922 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938901 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12579555 # total simulation time in cycles
sim_IPC                      1.6995 # instructions per cycle
sim_CPI                      0.5884 # cycles per instruction
sim_exec_BW                  1.7051 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49170665 # cumulative IFQ occupancy
IFQ_fcount                 11673493 # cumulative IFQ full count
ifq_occupancy                3.9088 # avg IFQ occupancy (insn's)
ifq_rate                     1.7051 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2923 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9280 # fraction of time (cycle's) IFQ was full
RUU_count                 199828081 # cumulative RUU occupancy
RUU_fcount                 12453307 # cumulative RUU full count
ruu_occupancy               15.8851 # avg RUU occupancy (insn's)
ruu_rate                     1.7051 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3160 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  64071697 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0933 # avg LSQ occupancy (insn's)
lsq_rate                     1.7051 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9870 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291683097 # total number of slip cycles
avg_sim_slip                13.6433 # the average slip between issue and retirement
bpred_bimod.lookups         1654334 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482598 # total number of accesses
il1.hits                   21482419 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943840 # total number of accesses
dl1.hits                    4939894 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       2606 # total number of hits
ul2.misses                     2571 # total number of misses
ul2.replacements               1547 # total number of replacements
ul2.writebacks                  659 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4966 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2988 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1273 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482598 # total number of accesses
itlb.hits                  21482592 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884054 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:20:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861219 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37472629 # total simulation time in cycles
sim_IPC                      0.7435 # instructions per cycle
sim_CPI                      1.3450 # cycles per instruction
sim_exec_BW                  0.7435 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 149817865 # cumulative IFQ occupancy
IFQ_fcount                 37454226 # cumulative IFQ full count
ifq_occupancy                3.9981 # avg IFQ occupancy (insn's)
ifq_rate                     0.7435 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3773 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 599274189 # cumulative RUU occupancy
RUU_fcount                 37453494 # cumulative RUU full count
ruu_occupancy               15.9923 # avg RUU occupancy (insn's)
ruu_rate                     0.7435 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.5093 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 182132347 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8604 # avg LSQ occupancy (insn's)
lsq_rate                     0.7435 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5371 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  817915184 # total number of slip cycles
avg_sim_slip                29.3584 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6071212 # total number of hits
dl1.misses                  2582186 # total number of misses
dl1.replacements            2581674 # total number of replacements
dl1.writebacks               958427 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2984 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2983 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1108 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3540824 # total number of accesses
ul2.hits                    3472444 # total number of hits
ul2.misses                    68380 # total number of misses
ul2.replacements              67356 # total number of replacements
ul2.writebacks                22676 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0193 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0190 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0064 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017756 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:20:37 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148931 # total number of instructions executed
sim_total_refs              4034196 # total number of loads and stores executed
sim_total_loads             3020632 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8420460 # total simulation time in cycles
sim_IPC                      1.5566 # instructions per cycle
sim_CPI                      0.6424 # cycles per instruction
sim_exec_BW                  1.5615 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31554599 # cumulative IFQ occupancy
IFQ_fcount                  7740964 # cumulative IFQ full count
ifq_occupancy                3.7474 # avg IFQ occupancy (insn's)
ifq_rate                     1.5615 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3998 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9193 # fraction of time (cycle's) IFQ was full
RUU_count                 130560871 # cumulative RUU occupancy
RUU_fcount                  7059398 # cumulative RUU full count
ruu_occupancy               15.5052 # avg RUU occupancy (insn's)
ruu_rate                     1.5615 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9294 # avg RUU occupant latency (cycle's)
ruu_full                     0.8384 # fraction of time (cycle's) RUU was full
LSQ_count                  39762650 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7221 # avg LSQ occupancy (insn's)
lsq_rate                     1.5615 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0240 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  187364494 # total number of slip cycles
avg_sim_slip                14.2949 # the average slip between issue and retirement
bpred_bimod.lookups         1010976 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000660 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10190 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189437 # total number of accesses
il1.hits                   13189259 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013392 # total number of accesses
dl1.hits                    3667350 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     345692 # total number of hits
ul2.misses                     1160 # total number of misses
ul2.replacements                648 # total number of replacements
ul2.writebacks                  257 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0033 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0019 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0007 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189437 # total number of accesses
itlb.hits                  13189431 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013776 # total number of accesses
dtlb.hits                   4013740 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357244 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:20:46 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12265063 # total number of instructions executed
sim_total_refs              4824026 # total number of loads and stores executed
sim_total_loads             2865548 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3197013 # total number of branches executed
sim_cycle                   6470301 # total simulation time in cycles
sim_IPC                      1.7899 # instructions per cycle
sim_CPI                      0.5587 # cycles per instruction
sim_exec_BW                  1.8956 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19208471 # cumulative IFQ occupancy
IFQ_fcount                  3973875 # cumulative IFQ full count
ifq_occupancy                2.9687 # avg IFQ occupancy (insn's)
ifq_rate                     1.8956 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5661 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6142 # fraction of time (cycle's) IFQ was full
RUU_count                  79082232 # cumulative RUU occupancy
RUU_fcount                  3370902 # cumulative RUU full count
ruu_occupancy               12.2223 # avg RUU occupancy (insn's)
ruu_rate                     1.8956 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4478 # avg RUU occupant latency (cycle's)
ruu_full                     0.5210 # fraction of time (cycle's) RUU was full
LSQ_count                  33463293 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1718 # avg LSQ occupancy (insn's)
lsq_rate                     1.8956 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.7283 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  126577474 # total number of slip cycles
avg_sim_slip                10.9293 # the average slip between issue and retirement
bpred_bimod.lookups         3257741 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442015 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435454 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820386 # total number of accesses
il1.hits                   12820169 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497839 # total number of accesses
dl1.hits                    4480763 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      21515 # total number of hits
ul2.misses                     6137 # total number of misses
ul2.replacements               5625 # total number of replacements
ul2.writebacks                 3800 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2219 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2034 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1374 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820386 # total number of accesses
itlb.hits                  12820379 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918722 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:20:54 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375698 # total number of instructions executed
sim_total_refs              6748388 # total number of loads and stores executed
sim_total_loads             3824347 # total number of loads executed
sim_total_stores       2924041.0000 # total number of stores executed
sim_total_branches           390629 # total number of branches executed
sim_cycle                  11536115 # total simulation time in cycles
sim_IPC                      1.1547 # instructions per cycle
sim_CPI                      0.8660 # cycles per instruction
sim_exec_BW                  1.1595 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  45240579 # cumulative IFQ occupancy
IFQ_fcount                 11160401 # cumulative IFQ full count
ifq_occupancy                3.9216 # avg IFQ occupancy (insn's)
ifq_rate                     1.1595 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.3823 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9674 # fraction of time (cycle's) IFQ was full
RUU_count                 181891302 # cumulative RUU occupancy
RUU_fcount                 11010871 # cumulative RUU full count
ruu_occupancy               15.7671 # avg RUU occupancy (insn's)
ruu_rate                     1.1595 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 13.5986 # avg RUU occupant latency (cycle's)
ruu_full                     0.9545 # fraction of time (cycle's) RUU was full
LSQ_count                  95723588 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2977 # avg LSQ occupancy (insn's)
lsq_rate                     1.1595 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.1565 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  297329579 # total number of slip cycles
avg_sim_slip                22.3205 # the average slip between issue and retirement
bpred_bimod.lookups          390938 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400277 # total number of accesses
il1.hits                   13399531 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171900 # total number of accesses
dl1.hits                    5866728 # total number of hits
dl1.misses                   305172 # total number of misses
dl1.replacements             304660 # total number of replacements
dl1.writebacks               156825 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0494 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462743 # total number of accesses
ul2.hits                     386565 # total number of hits
ul2.misses                    76178 # total number of misses
ul2.replacements              75666 # total number of replacements
ul2.writebacks                62226 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1646 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1635 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1345 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400277 # total number of accesses
itlb.hits                  13400258 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767128 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:21:06 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449984 # total number of instructions executed
sim_total_refs              6588955 # total number of loads and stores executed
sim_total_loads             4938902 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12520608 # total simulation time in cycles
sim_IPC                      1.7075 # instructions per cycle
sim_CPI                      0.5856 # cycles per instruction
sim_exec_BW                  1.7132 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48948367 # cumulative IFQ occupancy
IFQ_fcount                 11617538 # cumulative IFQ full count
ifq_occupancy                3.9094 # avg IFQ occupancy (insn's)
ifq_rate                     1.7132 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2820 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 198924968 # cumulative RUU occupancy
RUU_fcount                 12395794 # cumulative RUU full count
ruu_occupancy               15.8878 # avg RUU occupancy (insn's)
ruu_rate                     1.7132 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2739 # avg RUU occupant latency (cycle's)
ruu_full                     0.9900 # fraction of time (cycle's) RUU was full
LSQ_count                  63789427 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0948 # avg LSQ occupancy (insn's)
lsq_rate                     1.7132 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9739 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290497633 # total number of slip cycles
avg_sim_slip                13.5878 # the average slip between issue and retirement
bpred_bimod.lookups         1654335 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           57 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482587 # total number of accesses
il1.hits                   21482408 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944348 # total number of accesses
dl1.hits                    4940402 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       3867 # total number of hits
ul2.misses                     1310 # total number of misses
ul2.replacements                798 # total number of replacements
ul2.writebacks                  344 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2530 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1541 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0664 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482587 # total number of accesses
itlb.hits                  21482581 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884014 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:21:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861004 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203829 # total number of loads executed
sim_total_stores       1449774.0000 # total number of stores executed
sim_total_branches           481844 # total number of branches executed
sim_cycle                  36334149 # total simulation time in cycles
sim_IPC                      0.7668 # instructions per cycle
sim_CPI                      1.3042 # cycles per instruction
sim_exec_BW                  0.7668 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 145280955 # cumulative IFQ occupancy
IFQ_fcount                 36320000 # cumulative IFQ full count
ifq_occupancy                3.9985 # avg IFQ occupancy (insn's)
ifq_rate                     0.7668 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2145 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 581126314 # cumulative RUU occupancy
RUU_fcount                 36319319 # cumulative RUU full count
ruu_occupancy               15.9939 # avg RUU occupancy (insn's)
ruu_rate                     0.7668 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.8581 # avg RUU occupant latency (cycle's)
ruu_full                     0.9996 # fraction of time (cycle's) RUU was full
LSQ_count                 176671429 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8624 # avg LSQ occupancy (insn's)
lsq_rate                     0.7668 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3412 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  794305002 # total number of slip cycles
avg_sim_slip                28.5109 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860788 # total number of accesses
il1.hits                   27860577 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6076269 # total number of hits
dl1.misses                  2577129 # total number of misses
dl1.replacements            2576617 # total number of replacements
dl1.writebacks               958457 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2978 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2978 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1108 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3535797 # total number of accesses
ul2.hits                    3500931 # total number of hits
ul2.misses                    34866 # total number of misses
ul2.replacements              34354 # total number of replacements
ul2.writebacks                11770 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0099 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0097 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0033 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860788 # total number of accesses
itlb.hits                  27860782 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017752 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:21:43 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148952 # total number of instructions executed
sim_total_refs              4034200 # total number of loads and stores executed
sim_total_loads             3020636 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010957 # total number of branches executed
sim_cycle                   8393873 # total simulation time in cycles
sim_IPC                      1.5615 # instructions per cycle
sim_CPI                      0.6404 # cycles per instruction
sim_exec_BW                  1.5665 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31454449 # cumulative IFQ occupancy
IFQ_fcount                  7715913 # cumulative IFQ full count
ifq_occupancy                3.7473 # avg IFQ occupancy (insn's)
ifq_rate                     1.5665 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3922 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9192 # fraction of time (cycle's) IFQ was full
RUU_count                 130164317 # cumulative RUU occupancy
RUU_fcount                  7034132 # cumulative RUU full count
ruu_occupancy               15.5071 # avg RUU occupancy (insn's)
ruu_rate                     1.5665 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.8992 # avg RUU occupant latency (cycle's)
ruu_full                     0.8380 # fraction of time (cycle's) RUU was full
LSQ_count                  39629949 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7213 # avg LSQ occupancy (insn's)
lsq_rate                     1.5665 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0139 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  186833376 # total number of slip cycles
avg_sim_slip                14.2543 # the average slip between issue and retirement
bpred_bimod.lookups         1010980 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189458 # total number of accesses
il1.hits                   13189280 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013522 # total number of accesses
dl1.hits                    3667480 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346258 # total number of hits
ul2.misses                      594 # total number of misses
ul2.replacements                338 # total number of replacements
ul2.writebacks                  129 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0017 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0010 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0004 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189458 # total number of accesses
itlb.hits                  13189452 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013780 # total number of accesses
dtlb.hits                   4013744 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357336 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:21:52 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12265543 # total number of instructions executed
sim_total_refs              4824017 # total number of loads and stores executed
sim_total_loads             2865542 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197044 # total number of branches executed
sim_cycle                   6380758 # total simulation time in cycles
sim_IPC                      1.8151 # instructions per cycle
sim_CPI                      0.5509 # cycles per instruction
sim_exec_BW                  1.9223 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18855348 # cumulative IFQ occupancy
IFQ_fcount                  3885548 # cumulative IFQ full count
ifq_occupancy                2.9550 # avg IFQ occupancy (insn's)
ifq_rate                     1.9223 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5373 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6089 # fraction of time (cycle's) IFQ was full
RUU_count                  77669143 # cumulative RUU occupancy
RUU_fcount                  3281944 # cumulative RUU full count
ruu_occupancy               12.1724 # avg RUU occupancy (insn's)
ruu_rate                     1.9223 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3323 # avg RUU occupant latency (cycle's)
ruu_full                     0.5144 # fraction of time (cycle's) RUU was full
LSQ_count                  32878204 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1527 # avg LSQ occupancy (insn's)
lsq_rate                     1.9223 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6805 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124575281 # total number of slip cycles
avg_sim_slip                10.7564 # the average slip between issue and retirement
bpred_bimod.lookups         3257770 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984764 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442047 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435453 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820430 # total number of accesses
il1.hits                   12820213 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497839 # total number of accesses
dl1.hits                    4480763 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      24532 # total number of hits
ul2.misses                     3120 # total number of misses
ul2.replacements               2864 # total number of replacements
ul2.writebacks                 1917 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1128 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1036 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0693 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820430 # total number of accesses
itlb.hits                  12820423 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918888 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:22:00 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375845 # total number of instructions executed
sim_total_refs              6748395 # total number of loads and stores executed
sim_total_loads             3824348 # total number of loads executed
sim_total_stores       2924047.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  11876326 # total simulation time in cycles
sim_IPC                      1.1216 # instructions per cycle
sim_CPI                      0.8916 # cycles per instruction
sim_exec_BW                  1.1263 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  46634534 # cumulative IFQ occupancy
IFQ_fcount                 11508771 # cumulative IFQ full count
ifq_occupancy                3.9267 # avg IFQ occupancy (insn's)
ifq_rate                     1.1263 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.4865 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9691 # fraction of time (cycle's) IFQ was full
RUU_count                 187459312 # cumulative RUU occupancy
RUU_fcount                 11359327 # cumulative RUU full count
ruu_occupancy               15.7843 # avg RUU occupancy (insn's)
ruu_rate                     1.1263 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 14.0148 # avg RUU occupant latency (cycle's)
ruu_full                     0.9565 # fraction of time (cycle's) RUU was full
LSQ_count                  98625728 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3044 # avg LSQ occupancy (insn's)
lsq_rate                     1.1263 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.3734 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  305797224 # total number of slip cycles
avg_sim_slip                22.9562 # the average slip between issue and retirement
bpred_bimod.lookups          390940 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400289 # total number of accesses
il1.hits                   13399543 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171900 # total number of accesses
dl1.hits                    5866897 # total number of hits
dl1.misses                   305003 # total number of misses
dl1.replacements             304491 # total number of replacements
dl1.writebacks               156826 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462575 # total number of accesses
ul2.hits                     394980 # total number of hits
ul2.misses                    67595 # total number of misses
ul2.replacements              67339 # total number of replacements
ul2.writebacks                57336 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1461 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1456 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1239 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400289 # total number of accesses
itlb.hits                  13400270 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740999 # total number of accesses
dtlb.hits                   6736825 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767178 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:22:12 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449685 # total number of instructions executed
sim_total_refs              6588961 # total number of loads and stores executed
sim_total_loads             4938904 # total number of loads executed
sim_total_stores       1650057.0000 # total number of stores executed
sim_total_branches          1647450 # total number of branches executed
sim_cycle                  12490335 # total simulation time in cycles
sim_IPC                      1.7117 # instructions per cycle
sim_CPI                      0.5842 # cycles per instruction
sim_exec_BW                  1.7173 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48833162 # cumulative IFQ occupancy
IFQ_fcount                 11588541 # cumulative IFQ full count
ifq_occupancy                3.9097 # avg IFQ occupancy (insn's)
ifq_rate                     1.7173 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2766 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9278 # fraction of time (cycle's) IFQ was full
RUU_count                 198459635 # cumulative RUU occupancy
RUU_fcount                 12366109 # cumulative RUU full count
ruu_occupancy               15.8891 # avg RUU occupancy (insn's)
ruu_rate                     1.7173 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2523 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63643229 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0954 # avg LSQ occupancy (insn's)
lsq_rate                     1.7173 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9671 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  289884321 # total number of slip cycles
avg_sim_slip                13.5591 # the average slip between issue and retirement
bpred_bimod.lookups         1654339 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482601 # total number of accesses
il1.hits                   21482422 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944606 # total number of accesses
dl1.hits                    4940660 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       4505 # total number of hits
ul2.misses                      672 # total number of misses
ul2.replacements                416 # total number of replacements
ul2.writebacks                  183 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1298 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0804 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0353 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482601 # total number of accesses
itlb.hits                  21482595 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565560 # total number of accesses
dtlb.hits                   6565521 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884074 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:22:26 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861435 # total number of instructions executed
sim_total_refs              8653612 # total number of loads and stores executed
sim_total_loads             7203835 # total number of loads executed
sim_total_stores       1449777.0000 # total number of stores executed
sim_total_branches           481852 # total number of branches executed
sim_cycle                  35986382 # total simulation time in cycles
sim_IPC                      0.7742 # instructions per cycle
sim_CPI                      1.2917 # cycles per instruction
sim_exec_BW                  0.7742 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 143896254 # cumulative IFQ occupancy
IFQ_fcount                 35973757 # cumulative IFQ full count
ifq_occupancy                3.9986 # avg IFQ occupancy (insn's)
ifq_rate                     0.7742 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.1647 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 575587253 # cumulative RUU occupancy
RUU_fcount                 35973063 # cumulative RUU full count
ruu_occupancy               15.9946 # avg RUU occupancy (insn's)
ruu_rate                     0.7742 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.6589 # avg RUU occupant latency (cycle's)
ruu_full                     0.9996 # fraction of time (cycle's) RUU was full
LSQ_count                 173617441 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8245 # avg LSQ occupancy (insn's)
lsq_rate                     0.7742 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.2315 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  785710325 # total number of slip cycles
avg_sim_slip                28.2024 # the average slip between issue and retirement
bpred_bimod.lookups          481892 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          123 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860810 # total number of accesses
il1.hits                   27860599 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653401 # total number of accesses
dl1.hits                    5938859 # total number of hits
dl1.misses                  2714542 # total number of misses
dl1.replacements            2714030 # total number of replacements
dl1.writebacks               956651 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3137 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3136 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3671404 # total number of accesses
ul2.hits                    3653299 # total number of hits
ul2.misses                    18105 # total number of misses
ul2.replacements              17849 # total number of replacements
ul2.writebacks                 6372 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0049 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0049 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0017 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860810 # total number of accesses
itlb.hits                  27860804 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653414 # total number of accesses
dtlb.hits                   8652341 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017852 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:22:49 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8398155 # total simulation time in cycles
sim_IPC                      1.5607 # instructions per cycle
sim_CPI                      0.6407 # cycles per instruction
sim_exec_BW                  1.5657 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31460566 # cumulative IFQ occupancy
IFQ_fcount                  7717445 # cumulative IFQ full count
ifq_occupancy                3.7461 # avg IFQ occupancy (insn's)
ifq_rate                     1.5657 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3926 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9189 # fraction of time (cycle's) IFQ was full
RUU_count                 130190585 # cumulative RUU occupancy
RUU_fcount                  7035573 # cumulative RUU full count
ruu_occupancy               15.5023 # avg RUU occupancy (insn's)
ruu_rate                     1.5657 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9012 # avg RUU occupant latency (cycle's)
ruu_full                     0.8378 # fraction of time (cycle's) RUU was full
LSQ_count                  39637976 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7198 # avg LSQ occupancy (insn's)
lsq_rate                     1.5657 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0145 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  186871450 # total number of slip cycles
avg_sim_slip                14.2572 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189425 # total number of accesses
il1.hits                   13189247 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013580 # total number of accesses
dl1.hits                    3667538 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346541 # total number of hits
ul2.misses                      311 # total number of misses
ul2.replacements                183 # total number of replacements
ul2.writebacks                   66 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0009 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0005 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189425 # total number of accesses
itlb.hits                  13189419 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357194 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:22:58 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12266517 # total number of instructions executed
sim_total_refs              4824020 # total number of loads and stores executed
sim_total_loads             2865549 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197057 # total number of branches executed
sim_cycle                   6400225 # total simulation time in cycles
sim_IPC                      1.8095 # instructions per cycle
sim_CPI                      0.5526 # cycles per instruction
sim_exec_BW                  1.9166 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18914621 # cumulative IFQ occupancy
IFQ_fcount                  3900342 # cumulative IFQ full count
ifq_occupancy                2.9553 # avg IFQ occupancy (insn's)
ifq_rate                     1.9166 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5420 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6094 # fraction of time (cycle's) IFQ was full
RUU_count                  77907784 # cumulative RUU occupancy
RUU_fcount                  3296250 # cumulative RUU full count
ruu_occupancy               12.1727 # avg RUU occupancy (insn's)
ruu_rate                     1.9166 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3513 # avg RUU occupant latency (cycle's)
ruu_full                     0.5150 # fraction of time (cycle's) RUU was full
LSQ_count                  32992755 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1549 # avg LSQ occupancy (insn's)
lsq_rate                     1.9166 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6897 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124913337 # total number of slip cycles
avg_sim_slip                10.7856 # the average slip between issue and retirement
bpred_bimod.lookups         3257785 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442061 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820463 # total number of accesses
il1.hits                   12820246 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497830 # total number of accesses
dl1.hits                    4480754 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      26039 # total number of hits
ul2.misses                     1613 # total number of misses
ul2.replacements               1485 # total number of replacements
ul2.writebacks                  978 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0583 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0537 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0354 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820463 # total number of accesses
itlb.hits                  12820456 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514858 # total number of accesses
dtlb.hits                   4514792 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919034 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:23:06 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate           951493.4286 # simulation speed (in insts/sec)
sim_total_insn             13377250 # total number of instructions executed
sim_total_refs              6748388 # total number of loads and stores executed
sim_total_loads             3824348 # total number of loads executed
sim_total_stores       2924040.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  15485064 # total simulation time in cycles
sim_IPC                      0.8602 # instructions per cycle
sim_CPI                      1.1625 # cycles per instruction
sim_exec_BW                  0.8639 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  61038667 # cumulative IFQ occupancy
IFQ_fcount                 15109646 # cumulative IFQ full count
ifq_occupancy                3.9418 # avg IFQ occupancy (insn's)
ifq_rate                     0.8639 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  4.5629 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9758 # fraction of time (cycle's) IFQ was full
RUU_count                 245084873 # cumulative RUU occupancy
RUU_fcount                 14960044 # cumulative RUU full count
ruu_occupancy               15.8272 # avg RUU occupancy (insn's)
ruu_rate                     0.8639 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 18.3210 # avg RUU occupant latency (cycle's)
ruu_full                     0.9661 # fraction of time (cycle's) RUU was full
LSQ_count                 130212528 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.4089 # avg LSQ occupancy (insn's)
lsq_rate                     0.8639 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  9.7339 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  395010433 # total number of slip cycles
avg_sim_slip                29.6534 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400270 # total number of accesses
il1.hits                   13399525 # total number of hits
il1.misses                      745 # total number of misses
il1.replacements                249 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171935 # total number of accesses
dl1.hits                    5867003 # total number of hits
dl1.misses                   304932 # total number of misses
dl1.replacements             304420 # total number of replacements
dl1.writebacks               156826 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462503 # total number of accesses
ul2.hits                     396143 # total number of hits
ul2.misses                    66360 # total number of misses
ul2.replacements              66232 # total number of replacements
ul2.writebacks                55963 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1435 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1432 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1210 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400270 # total number of accesses
itlb.hits                  13400251 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767102 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:23:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449676 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938903 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12496842 # total simulation time in cycles
sim_IPC                      1.7108 # instructions per cycle
sim_CPI                      0.5845 # cycles per instruction
sim_exec_BW                  1.7164 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48848356 # cumulative IFQ occupancy
IFQ_fcount                 11592245 # cumulative IFQ full count
ifq_occupancy                3.9089 # avg IFQ occupancy (insn's)
ifq_rate                     1.7164 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2773 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9276 # fraction of time (cycle's) IFQ was full
RUU_count                 198519275 # cumulative RUU occupancy
RUU_fcount                 12369431 # cumulative RUU full count
ruu_occupancy               15.8856 # avg RUU occupancy (insn's)
ruu_rate                     1.7164 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2551 # avg RUU occupant latency (cycle's)
ruu_full                     0.9898 # fraction of time (cycle's) RUU was full
LSQ_count                  63662003 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0942 # avg LSQ occupancy (insn's)
lsq_rate                     1.7164 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9680 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  289965255 # total number of slip cycles
avg_sim_slip                13.5629 # the average slip between issue and retirement
bpred_bimod.lookups         1654336 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482586 # total number of accesses
il1.hits                   21482407 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944732 # total number of accesses
dl1.hits                    4940786 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       4824 # total number of hits
ul2.misses                      353 # total number of misses
ul2.replacements                225 # total number of replacements
ul2.writebacks                  100 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0682 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0435 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0193 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482586 # total number of accesses
itlb.hits                  21482580 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884012 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:23:33 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27862331 # total number of instructions executed
sim_total_refs              8653604 # total number of loads and stores executed
sim_total_loads             7203832 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  36248719 # total simulation time in cycles
sim_IPC                      0.7686 # instructions per cycle
sim_CPI                      1.3011 # cycles per instruction
sim_exec_BW                  0.7686 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 144932328 # cumulative IFQ occupancy
IFQ_fcount                 36232841 # cumulative IFQ full count
ifq_occupancy                3.9983 # avg IFQ occupancy (insn's)
ifq_rate                     0.7686 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2017 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 579730435 # cumulative RUU occupancy
RUU_fcount                 36231846 # cumulative RUU full count
ruu_occupancy               15.9931 # avg RUU occupancy (insn's)
ruu_rate                     0.7686 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.8070 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 174197450 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8056 # avg LSQ occupancy (insn's)
lsq_rate                     0.7686 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.2521 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  790436570 # total number of slip cycles
avg_sim_slip                28.3720 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860786 # total number of accesses
il1.hits                   27860575 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5867867 # total number of hits
dl1.misses                  2785531 # total number of misses
dl1.replacements            2785019 # total number of replacements
dl1.writebacks               955704 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3219 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3218 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3741446 # total number of accesses
ul2.hits                    3731754 # total number of hits
ul2.misses                     9692 # total number of misses
ul2.replacements               9564 # total number of replacements
ul2.writebacks                 3646 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0026 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0026 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0010 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860786 # total number of accesses
itlb.hits                  27860780 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017750 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:23:57 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8409781 # total simulation time in cycles
sim_IPC                      1.5586 # instructions per cycle
sim_CPI                      0.6416 # cycles per instruction
sim_exec_BW                  1.5635 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31495351 # cumulative IFQ occupancy
IFQ_fcount                  7726149 # cumulative IFQ full count
ifq_occupancy                3.7451 # avg IFQ occupancy (insn's)
ifq_rate                     1.5635 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3953 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9187 # fraction of time (cycle's) IFQ was full
RUU_count                 130334441 # cumulative RUU occupancy
RUU_fcount                  7044202 # cumulative RUU full count
ruu_occupancy               15.4980 # avg RUU occupancy (insn's)
ruu_rate                     1.5635 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9122 # avg RUU occupant latency (cycle's)
ruu_full                     0.8376 # fraction of time (cycle's) RUU was full
LSQ_count                  39683765 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7188 # avg LSQ occupancy (insn's)
lsq_rate                     1.5635 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0180 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  187061095 # total number of slip cycles
avg_sim_slip                14.2717 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189425 # total number of accesses
il1.hits                   13189247 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013611 # total number of accesses
dl1.hits                    3667569 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346687 # total number of hits
ul2.misses                      165 # total number of misses
ul2.replacements                101 # total number of replacements
ul2.writebacks                   34 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0005 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0003 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189425 # total number of accesses
itlb.hits                  13189419 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357194 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:24:05 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12266693 # total number of instructions executed
sim_total_refs              4824020 # total number of loads and stores executed
sim_total_loads             2865549 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197065 # total number of branches executed
sim_cycle                   6420950 # total simulation time in cycles
sim_IPC                      1.8037 # instructions per cycle
sim_CPI                      0.5544 # cycles per instruction
sim_exec_BW                  1.9104 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18979143 # cumulative IFQ occupancy
IFQ_fcount                  3916463 # cumulative IFQ full count
ifq_occupancy                2.9558 # avg IFQ occupancy (insn's)
ifq_rate                     1.9104 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5472 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6100 # fraction of time (cycle's) IFQ was full
RUU_count                  78166775 # cumulative RUU occupancy
RUU_fcount                  3312196 # cumulative RUU full count
ruu_occupancy               12.1737 # avg RUU occupancy (insn's)
ruu_rate                     1.9104 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3723 # avg RUU occupant latency (cycle's)
ruu_full                     0.5158 # fraction of time (cycle's) RUU was full
LSQ_count                  33136044 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1606 # avg LSQ occupancy (insn's)
lsq_rate                     1.9104 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.7013 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  125284297 # total number of slip cycles
avg_sim_slip                10.8176 # the average slip between issue and retirement
bpred_bimod.lookups         3257793 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442067 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820477 # total number of accesses
il1.hits                   12820260 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497830 # total number of accesses
dl1.hits                    4480754 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      26806 # total number of hits
ul2.misses                      846 # total number of misses
ul2.replacements                782 # total number of replacements
ul2.writebacks                  506 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0306 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0283 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0183 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820477 # total number of accesses
itlb.hits                  12820470 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514858 # total number of accesses
dtlb.hits                   4514792 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919090 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:24:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 17 # total simulation time in seconds
sim_inst_rate           783582.8235 # simulation speed (in insts/sec)
sim_total_insn             13376544 # total number of instructions executed
sim_total_refs              6748391 # total number of loads and stores executed
sim_total_loads             3824351 # total number of loads executed
sim_total_stores       2924040.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  22930906 # total simulation time in cycles
sim_IPC                      0.5809 # instructions per cycle
sim_CPI                      1.7214 # cycles per instruction
sim_exec_BW                  0.5833 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  90780658 # cumulative IFQ occupancy
IFQ_fcount                 22545051 # cumulative IFQ full count
ifq_occupancy                3.9589 # avg IFQ occupancy (insn's)
ifq_rate                     0.5833 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  6.7866 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9832 # fraction of time (cycle's) IFQ was full
RUU_count                 364064930 # cumulative RUU occupancy
RUU_fcount                 22395690 # cumulative RUU full count
ruu_occupancy               15.8766 # avg RUU occupancy (insn's)
ruu_rate                     0.5833 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 27.2167 # avg RUU occupant latency (cycle's)
ruu_full                     0.9767 # fraction of time (cycle's) RUU was full
LSQ_count                 195558433 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.5282 # avg LSQ occupancy (insn's)
lsq_rate                     0.5833 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 14.6195 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  579328925 # total number of slip cycles
avg_sim_slip                43.4902 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400279 # total number of accesses
il1.hits                   13399534 # total number of hits
il1.misses                      745 # total number of misses
il1.replacements                249 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171957 # total number of accesses
dl1.hits                    5867042 # total number of hits
dl1.misses                   304915 # total number of misses
dl1.replacements             304403 # total number of replacements
dl1.writebacks               156830 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462490 # total number of accesses
ul2.hits                     396653 # total number of hits
ul2.misses                    65837 # total number of misses
ul2.replacements              65773 # total number of replacements
ul2.writebacks                55776 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1424 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1422 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1206 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400279 # total number of accesses
itlb.hits                  13400260 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767144 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:24:30 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449664 # total number of instructions executed
sim_total_refs              6588951 # total number of loads and stores executed
sim_total_loads             4938902 # total number of loads executed
sim_total_stores       1650049.0000 # total number of stores executed
sim_total_branches          1647445 # total number of branches executed
sim_cycle                  12504304 # total simulation time in cycles
sim_IPC                      1.7098 # instructions per cycle
sim_CPI                      0.5849 # cycles per instruction
sim_exec_BW                  1.7154 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48866473 # cumulative IFQ occupancy
IFQ_fcount                 11596727 # cumulative IFQ full count
ifq_occupancy                3.9080 # avg IFQ occupancy (insn's)
ifq_rate                     1.7154 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2782 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9274 # fraction of time (cycle's) IFQ was full
RUU_count                 198591852 # cumulative RUU occupancy
RUU_fcount                 12373733 # cumulative RUU full count
ruu_occupancy               15.8819 # avg RUU occupancy (insn's)
ruu_rate                     1.7154 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2585 # avg RUU occupant latency (cycle's)
ruu_full                     0.9896 # fraction of time (cycle's) RUU was full
LSQ_count                  63689803 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0934 # avg LSQ occupancy (insn's)
lsq_rate                     1.7154 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9693 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290065701 # total number of slip cycles
avg_sim_slip                13.5676 # the average slip between issue and retirement
bpred_bimod.lookups         1654333 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           77 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482573 # total number of accesses
il1.hits                   21482394 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944795 # total number of accesses
dl1.hits                    4940849 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       4984 # total number of hits
ul2.misses                      193 # total number of misses
ul2.replacements                129 # total number of replacements
ul2.writebacks                   55 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0373 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0249 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0106 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482573 # total number of accesses
itlb.hits                  21482567 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565557 # total number of accesses
dtlb.hits                   6565518 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178883958 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:24:43 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27862477 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203831 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481846 # total number of branches executed
sim_cycle                  36501187 # total simulation time in cycles
sim_IPC                      0.7633 # instructions per cycle
sim_CPI                      1.3102 # cycles per instruction
sim_exec_BW                  0.7633 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 145928566 # cumulative IFQ occupancy
IFQ_fcount                 36481892 # cumulative IFQ full count
ifq_occupancy                3.9979 # avg IFQ occupancy (insn's)
ifq_rate                     0.7633 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2375 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 583715243 # cumulative RUU occupancy
RUU_fcount                 36480868 # cumulative RUU full count
ruu_occupancy               15.9917 # avg RUU occupancy (insn's)
ruu_rate                     0.7633 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.9499 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 175107428 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7973 # avg LSQ occupancy (insn's)
lsq_rate                     0.7633 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.2847 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  795331129 # total number of slip cycles
avg_sim_slip                28.5477 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860784 # total number of accesses
il1.hits                   27860573 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5833046 # total number of hits
dl1.misses                  2820352 # total number of misses
dl1.replacements            2819840 # total number of replacements
dl1.writebacks               955247 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3259 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3259 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3775810 # total number of accesses
ul2.hits                    3770326 # total number of hits
ul2.misses                     5484 # total number of misses
ul2.replacements               5420 # total number of replacements
ul2.writebacks                 2278 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0015 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0014 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0006 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860784 # total number of accesses
itlb.hits                  27860778 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017740 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:25:06 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8414448 # total simulation time in cycles
sim_IPC                      1.5577 # instructions per cycle
sim_CPI                      0.6420 # cycles per instruction
sim_exec_BW                  1.5627 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31504259 # cumulative IFQ occupancy
IFQ_fcount                  7728373 # cumulative IFQ full count
ifq_occupancy                3.7441 # avg IFQ occupancy (insn's)
ifq_rate                     1.5627 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3960 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9185 # fraction of time (cycle's) IFQ was full
RUU_count                 130369556 # cumulative RUU occupancy
RUU_fcount                  7046403 # cumulative RUU full count
ruu_occupancy               15.4935 # avg RUU occupancy (insn's)
ruu_rate                     1.5627 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9148 # avg RUU occupant latency (cycle's)
ruu_full                     0.8374 # fraction of time (cycle's) RUU was full
LSQ_count                  39695698 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7176 # avg LSQ occupancy (insn's)
lsq_rate                     1.5627 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0189 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  187108133 # total number of slip cycles
avg_sim_slip                14.2753 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189424 # total number of accesses
il1.hits                   13189246 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013627 # total number of accesses
dl1.hits                    3667585 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346764 # total number of hits
ul2.misses                       88 # total number of misses
ul2.replacements                 56 # total number of replacements
ul2.writebacks                   18 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0003 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0002 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189424 # total number of accesses
itlb.hits                  13189418 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357190 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:25:15 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12268553 # total number of instructions executed
sim_total_refs              4824025 # total number of loads and stores executed
sim_total_loads             2865550 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197074 # total number of branches executed
sim_cycle                   6446920 # total simulation time in cycles
sim_IPC                      1.7964 # instructions per cycle
sim_CPI                      0.5567 # cycles per instruction
sim_exec_BW                  1.9030 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19060426 # cumulative IFQ occupancy
IFQ_fcount                  3936774 # cumulative IFQ full count
ifq_occupancy                2.9565 # avg IFQ occupancy (insn's)
ifq_rate                     1.9030 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5536 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6106 # fraction of time (cycle's) IFQ was full
RUU_count                  78486290 # cumulative RUU occupancy
RUU_fcount                  3331973 # cumulative RUU full count
ruu_occupancy               12.1742 # avg RUU occupancy (insn's)
ruu_rate                     1.9030 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3974 # avg RUU occupant latency (cycle's)
ruu_full                     0.5168 # fraction of time (cycle's) RUU was full
LSQ_count                  33305260 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1661 # avg LSQ occupancy (insn's)
lsq_rate                     1.9030 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.7147 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  125687026 # total number of slip cycles
avg_sim_slip                10.8524 # the average slip between issue and retirement
bpred_bimod.lookups         3257802 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442075 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820501 # total number of accesses
il1.hits                   12820284 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497831 # total number of accesses
dl1.hits                    4480755 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      27193 # total number of hits
ul2.misses                      459 # total number of misses
ul2.replacements                427 # total number of replacements
ul2.writebacks                  265 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0166 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0154 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0096 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820501 # total number of accesses
itlb.hits                  12820494 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919188 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:25:22 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate           555037.8333 # simulation speed (in insts/sec)
sim_total_insn             13378374 # total number of instructions executed
sim_total_refs              6748394 # total number of loads and stores executed
sim_total_loads             3824351 # total number of loads executed
sim_total_stores       2924043.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  39210740 # total simulation time in cycles
sim_IPC                      0.3397 # instructions per cycle
sim_CPI                      2.9435 # cycles per instruction
sim_exec_BW                  0.3412 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                 155817331 # cumulative IFQ occupancy
IFQ_fcount                 38804531 # cumulative IFQ full count
ifq_occupancy                3.9738 # avg IFQ occupancy (insn's)
ifq_rate                     0.3412 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                 11.6470 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9896 # fraction of time (cycle's) IFQ was full
RUU_count                 624265902 # cumulative RUU occupancy
RUU_fcount                 38654314 # cumulative RUU full count
ruu_occupancy               15.9208 # avg RUU occupancy (insn's)
ruu_rate                     0.3412 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 46.6623 # avg RUU occupant latency (cycle's)
ruu_full                     0.9858 # fraction of time (cycle's) RUU was full
LSQ_count                 338678489 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.6374 # avg LSQ occupancy (insn's)
lsq_rate                     0.3412 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 25.3154 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  982634022 # total number of slip cycles
avg_sim_slip                73.7663 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400286 # total number of accesses
il1.hits                   13399540 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6172114 # total number of accesses
dl1.hits                    5867188 # total number of hits
dl1.misses                   304926 # total number of misses
dl1.replacements             304414 # total number of replacements
dl1.writebacks               156830 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462502 # total number of accesses
ul2.hits                     394305 # total number of hits
ul2.misses                    68197 # total number of misses
ul2.replacements              68165 # total number of replacements
ul2.writebacks                57465 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1475 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1474 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1242 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400286 # total number of accesses
itlb.hits                  13400267 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767172 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:25:46 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449676 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938903 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12509922 # total simulation time in cycles
sim_IPC                      1.7090 # instructions per cycle
sim_CPI                      0.5851 # cycles per instruction
sim_exec_BW                  1.7146 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48879260 # cumulative IFQ occupancy
IFQ_fcount                 11599899 # cumulative IFQ full count
ifq_occupancy                3.9072 # avg IFQ occupancy (insn's)
ifq_rate                     1.7146 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2788 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9273 # fraction of time (cycle's) IFQ was full
RUU_count                 198637168 # cumulative RUU occupancy
RUU_fcount                 12376809 # cumulative RUU full count
ruu_occupancy               15.8784 # avg RUU occupancy (insn's)
ruu_rate                     1.7146 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2606 # avg RUU occupant latency (cycle's)
ruu_full                     0.9894 # fraction of time (cycle's) RUU was full
LSQ_count                  63705072 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0924 # avg LSQ occupancy (insn's)
lsq_rate                     1.7146 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9700 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290126217 # total number of slip cycles
avg_sim_slip                13.5705 # the average slip between issue and retirement
bpred_bimod.lookups         1654336 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482585 # total number of accesses
il1.hits                   21482406 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944828 # total number of accesses
dl1.hits                    4940882 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       5072 # total number of hits
ul2.misses                      105 # total number of misses
ul2.replacements                 73 # total number of replacements
ul2.writebacks                   31 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0203 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0141 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0060 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482585 # total number of accesses
itlb.hits                  21482579 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884008 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:25:59 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27860701 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203831 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481846 # total number of branches executed
sim_cycle                  36860475 # total simulation time in cycles
sim_IPC                      0.7558 # instructions per cycle
sim_CPI                      1.3231 # cycles per instruction
sim_exec_BW                  0.7558 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 147352326 # cumulative IFQ occupancy
IFQ_fcount                 36837832 # cumulative IFQ full count
ifq_occupancy                3.9976 # avg IFQ occupancy (insn's)
ifq_rate                     0.7558 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2889 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 589410785 # cumulative RUU occupancy
RUU_fcount                 36837264 # cumulative RUU full count
ruu_occupancy               15.9903 # avg RUU occupancy (insn's)
ruu_rate                     0.7558 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.1556 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 176819242 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7970 # avg LSQ occupancy (insn's)
lsq_rate                     0.7558 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3465 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  802738929 # total number of slip cycles
avg_sim_slip                28.8136 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860784 # total number of accesses
il1.hits                   27860573 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5817932 # total number of hits
dl1.misses                  2835466 # total number of misses
dl1.replacements            2834954 # total number of replacements
dl1.writebacks               955047 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3277 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3276 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3790724 # total number of accesses
ul2.hits                    3787350 # total number of hits
ul2.misses                     3374 # total number of misses
ul2.replacements               3342 # total number of replacements
ul2.writebacks                 1593 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0009 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0009 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0004 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860784 # total number of accesses
itlb.hits                  27860778 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017740 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:26:22 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8416933 # total simulation time in cycles
sim_IPC                      1.5572 # instructions per cycle
sim_CPI                      0.6422 # cycles per instruction
sim_exec_BW                  1.5622 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31506512 # cumulative IFQ occupancy
IFQ_fcount                  7728935 # cumulative IFQ full count
ifq_occupancy                3.7432 # avg IFQ occupancy (insn's)
ifq_rate                     1.5622 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3961 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9183 # fraction of time (cycle's) IFQ was full
RUU_count                 130388615 # cumulative RUU occupancy
RUU_fcount                  7046961 # cumulative RUU full count
ruu_occupancy               15.4912 # avg RUU occupancy (insn's)
ruu_rate                     1.5622 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9163 # avg RUU occupant latency (cycle's)
ruu_full                     0.8372 # fraction of time (cycle's) RUU was full
LSQ_count                  39702240 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7169 # avg LSQ occupancy (insn's)
lsq_rate                     1.5622 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0194 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  187133739 # total number of slip cycles
avg_sim_slip                14.2773 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189424 # total number of accesses
il1.hits                   13189246 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013634 # total number of accesses
dl1.hits                    3667592 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346805 # total number of hits
ul2.misses                       47 # total number of misses
ul2.replacements                 31 # total number of replacements
ul2.writebacks                   10 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0001 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0001 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189424 # total number of accesses
itlb.hits                  13189418 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357190 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:26:31 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12286609 # total number of instructions executed
sim_total_refs              4824025 # total number of loads and stores executed
sim_total_loads             2865550 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197077 # total number of branches executed
sim_cycle                   6528898 # total simulation time in cycles
sim_IPC                      1.7739 # instructions per cycle
sim_CPI                      0.5637 # cycles per instruction
sim_exec_BW                  1.8819 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19358668 # cumulative IFQ occupancy
IFQ_fcount                  4011331 # cumulative IFQ full count
ifq_occupancy                2.9651 # avg IFQ occupancy (insn's)
ifq_rate                     1.8819 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5756 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6144 # fraction of time (cycle's) IFQ was full
RUU_count                  79653541 # cumulative RUU occupancy
RUU_fcount                  3401993 # cumulative RUU full count
ruu_occupancy               12.2002 # avg RUU occupancy (insn's)
ruu_rate                     1.8819 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4830 # avg RUU occupant latency (cycle's)
ruu_full                     0.5211 # fraction of time (cycle's) RUU was full
LSQ_count                  33938067 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1981 # avg LSQ occupancy (insn's)
lsq_rate                     1.8819 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.7622 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  126725945 # total number of slip cycles
avg_sim_slip                10.9421 # the average slip between issue and retirement
bpred_bimod.lookups         3257805 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442078 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820505 # total number of accesses
il1.hits                   12820288 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497831 # total number of accesses
dl1.hits                    4480755 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      27337 # total number of hits
ul2.misses                      315 # total number of misses
ul2.replacements                299 # total number of replacements
ul2.writebacks                  174 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0114 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0108 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0063 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820505 # total number of accesses
itlb.hits                  12820498 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919204 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:26:39 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 39 # total simulation time in seconds
sim_inst_rate           341561.7436 # simulation speed (in insts/sec)
sim_total_insn             13381777 # total number of instructions executed
sim_total_refs              6748283 # total number of loads and stores executed
sim_total_loads             3824239 # total number of loads executed
sim_total_stores       2924044.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  79435805 # total simulation time in cycles
sim_IPC                      0.1677 # instructions per cycle
sim_CPI                      5.9632 # cycles per instruction
sim_exec_BW                  0.1685 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                 316598371 # cumulative IFQ occupancy
IFQ_fcount                 78999273 # cumulative IFQ full count
ifq_occupancy                3.9856 # avg IFQ occupancy (insn's)
ifq_rate                     0.1685 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                 23.6589 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9945 # fraction of time (cycle's) IFQ was full
RUU_count                1267485759 # cumulative RUU occupancy
RUU_fcount                 78844342 # cumulative RUU full count
ruu_occupancy               15.9561 # avg RUU occupancy (insn's)
ruu_rate                     0.1685 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 94.7173 # avg RUU occupant latency (cycle's)
ruu_full                     0.9926 # fraction of time (cycle's) RUU was full
LSQ_count                 688953619 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.6731 # avg LSQ occupancy (insn's)
lsq_rate                     0.1685 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 51.4845 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                 1976089445 # total number of slip cycles
avg_sim_slip               148.3450 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400044 # total number of accesses
il1.hits                   13399298 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6172430 # total number of accesses
dl1.hits                    5867632 # total number of hits
dl1.misses                   304798 # total number of misses
dl1.replacements             304286 # total number of replacements
dl1.writebacks               156804 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462348 # total number of accesses
ul2.hits                     392839 # total number of hits
ul2.misses                    69509 # total number of misses
ul2.replacements              69493 # total number of replacements
ul2.writebacks                55871 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1503 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1503 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1208 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400044 # total number of accesses
itlb.hits                  13400025 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156765868 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:27:18 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449676 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938903 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12521350 # total simulation time in cycles
sim_IPC                      1.7074 # instructions per cycle
sim_CPI                      0.5857 # cycles per instruction
sim_exec_BW                  1.7130 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48902826 # cumulative IFQ occupancy
IFQ_fcount                 11605779 # cumulative IFQ full count
ifq_occupancy                3.9056 # avg IFQ occupancy (insn's)
ifq_rate                     1.7130 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2799 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9269 # fraction of time (cycle's) IFQ was full
RUU_count                 198732834 # cumulative RUU occupancy
RUU_fcount                 12382639 # cumulative RUU full count
ruu_occupancy               15.8715 # avg RUU occupancy (insn's)
ruu_rate                     1.7130 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2651 # avg RUU occupant latency (cycle's)
ruu_full                     0.9889 # fraction of time (cycle's) RUU was full
LSQ_count                  63740827 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0906 # avg LSQ occupancy (insn's)
lsq_rate                     1.7130 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9716 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290257638 # total number of slip cycles
avg_sim_slip                13.5766 # the average slip between issue and retirement
bpred_bimod.lookups         1654336 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482585 # total number of accesses
il1.hits                   21482406 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944844 # total number of accesses
dl1.hits                    4940898 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       5119 # total number of hits
ul2.misses                       58 # total number of misses
ul2.replacements                 42 # total number of replacements
ul2.writebacks                   16 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0112 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0081 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0031 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482585 # total number of accesses
itlb.hits                  21482579 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884008 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 32 -mem:lat 72 6 -mem:minBurstLength 2 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:27:32 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 6 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 32 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            2 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27860703 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203831 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481846 # total number of branches executed
sim_cycle                  37510235 # total simulation time in cycles
sim_IPC                      0.7427 # instructions per cycle
sim_CPI                      1.3464 # cycles per instruction
sim_exec_BW                  0.7427 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 149943590 # cumulative IFQ occupancy
IFQ_fcount                 37485649 # cumulative IFQ full count
ifq_occupancy                3.9974 # avg IFQ occupancy (insn's)
ifq_rate                     0.7427 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3819 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9993 # fraction of time (cycle's) IFQ was full
RUU_count                 599776942 # cumulative RUU occupancy
RUU_fcount                 37485092 # cumulative RUU full count
ruu_occupancy               15.9897 # avg RUU occupancy (insn's)
ruu_rate                     0.7427 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.5277 # avg RUU occupant latency (cycle's)
ruu_full                     0.9993 # fraction of time (cycle's) RUU was full
LSQ_count                 180125275 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8020 # avg LSQ occupancy (insn's)
lsq_rate                     0.7427 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4652 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  816386382 # total number of slip cycles
avg_sim_slip                29.3035 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860788 # total number of accesses
il1.hits                   27860577 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5809162 # total number of hits
dl1.misses                  2844236 # total number of misses
dl1.replacements            2843724 # total number of replacements
dl1.writebacks               954932 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3287 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3286 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3799379 # total number of accesses
ul2.hits                    3797063 # total number of hits
ul2.misses                     2316 # total number of misses
ul2.replacements               2300 # total number of replacements
ul2.writebacks                 1251 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0006 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0006 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0003 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860788 # total number of accesses
itlb.hits                  27860782 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017756 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:27:55 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13149148 # total number of instructions executed
sim_total_refs              4034192 # total number of loads and stores executed
sim_total_loads             3020633 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8452162 # total simulation time in cycles
sim_IPC                      1.5507 # instructions per cycle
sim_CPI                      0.6449 # cycles per instruction
sim_exec_BW                  1.5557 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31675440 # cumulative IFQ occupancy
IFQ_fcount                  7771183 # cumulative IFQ full count
ifq_occupancy                3.7476 # avg IFQ occupancy (insn's)
ifq_rate                     1.5557 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4089 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9194 # fraction of time (cycle's) IFQ was full
RUU_count                 131042293 # cumulative RUU occupancy
RUU_fcount                  7090005 # cumulative RUU full count
ruu_occupancy               15.5040 # avg RUU occupancy (insn's)
ruu_rate                     1.5557 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9658 # avg RUU occupant latency (cycle's)
ruu_full                     0.8388 # fraction of time (cycle's) RUU was full
LSQ_count                  39925511 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7237 # avg LSQ occupancy (insn's)
lsq_rate                     1.5557 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0364 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  188009535 # total number of slip cycles
avg_sim_slip                14.3441 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189454 # total number of accesses
il1.hits                   13189276 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    3667132 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     344575 # total number of hits
ul2.misses                     2277 # total number of misses
ul2.replacements               1253 # total number of replacements
ul2.writebacks                  512 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0066 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0036 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0015 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189454 # total number of accesses
itlb.hits                  13189448 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357314 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:28:04 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12265172 # total number of instructions executed
sim_total_refs              4824018 # total number of loads and stores executed
sim_total_loads             2865540 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196951 # total number of branches executed
sim_cycle                   6589691 # total simulation time in cycles
sim_IPC                      1.7575 # instructions per cycle
sim_CPI                      0.5690 # cycles per instruction
sim_exec_BW                  1.8613 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19677502 # cumulative IFQ occupancy
IFQ_fcount                  4091226 # cumulative IFQ full count
ifq_occupancy                2.9861 # avg IFQ occupancy (insn's)
ifq_rate                     1.8613 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.6043 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6209 # fraction of time (cycle's) IFQ was full
RUU_count                  80957239 # cumulative RUU occupancy
RUU_fcount                  3489245 # cumulative RUU full count
ruu_occupancy               12.2854 # avg RUU occupancy (insn's)
ruu_rate                     1.8613 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.6006 # avg RUU occupant latency (cycle's)
ruu_full                     0.5295 # fraction of time (cycle's) RUU was full
LSQ_count                  34239589 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1959 # avg LSQ occupancy (insn's)
lsq_rate                     1.8613 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.7916 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  129233352 # total number of slip cycles
avg_sim_slip                11.1586 # the average slip between issue and retirement
bpred_bimod.lookups         3257679 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441951 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435454 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820264 # total number of accesses
il1.hits                   12820047 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497834 # total number of accesses
dl1.hits                    4480758 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      15488 # total number of hits
ul2.misses                    12164 # total number of misses
ul2.replacements              11140 # total number of replacements
ul2.writebacks                 7571 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4399 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.4029 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.2738 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820264 # total number of accesses
itlb.hits                  12820257 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918218 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:28:12 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375190 # total number of instructions executed
sim_total_refs              6748385 # total number of loads and stores executed
sim_total_loads             3824342 # total number of loads executed
sim_total_stores       2924043.0000 # total number of stores executed
sim_total_branches           390626 # total number of branches executed
sim_cycle                  11457347 # total simulation time in cycles
sim_IPC                      1.1627 # instructions per cycle
sim_CPI                      0.8601 # cycles per instruction
sim_exec_BW                  1.1674 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  44885797 # cumulative IFQ occupancy
IFQ_fcount                 11071737 # cumulative IFQ full count
ifq_occupancy                3.9176 # avg IFQ occupancy (insn's)
ifq_rate                     1.1674 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.3559 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9663 # fraction of time (cycle's) IFQ was full
RUU_count                 180457361 # cumulative RUU occupancy
RUU_fcount                 10922009 # cumulative RUU full count
ruu_occupancy               15.7504 # avg RUU occupancy (insn's)
ruu_rate                     1.1674 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 13.4919 # avg RUU occupant latency (cycle's)
ruu_full                     0.9533 # fraction of time (cycle's) RUU was full
LSQ_count                  95232426 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3119 # avg LSQ occupancy (insn's)
lsq_rate                     1.1674 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.1201 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  295405539 # total number of slip cycles
avg_sim_slip                22.1761 # the average slip between issue and retirement
bpred_bimod.lookups          390935 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89876 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400274 # total number of accesses
il1.hits                   13399528 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171921 # total number of accesses
dl1.hits                    5866420 # total number of hits
dl1.misses                   305501 # total number of misses
dl1.replacements             304989 # total number of replacements
dl1.writebacks               156829 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0495 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0494 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 463076 # total number of accesses
ul2.hits                     366978 # total number of hits
ul2.misses                    96098 # total number of misses
ul2.replacements              95074 # total number of replacements
ul2.writebacks                73071 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2075 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2053 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1578 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400274 # total number of accesses
itlb.hits                  13400255 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767100 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:28:24 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449886 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938901 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12558630 # total simulation time in cycles
sim_IPC                      1.7024 # instructions per cycle
sim_CPI                      0.5874 # cycles per instruction
sim_exec_BW                  1.7080 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49093229 # cumulative IFQ occupancy
IFQ_fcount                 11654134 # cumulative IFQ full count
ifq_occupancy                3.9091 # avg IFQ occupancy (insn's)
ifq_rate                     1.7080 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2887 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9280 # fraction of time (cycle's) IFQ was full
RUU_count                 199517914 # cumulative RUU occupancy
RUU_fcount                 12433957 # cumulative RUU full count
ruu_occupancy               15.8869 # avg RUU occupancy (insn's)
ruu_rate                     1.7080 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3016 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63973759 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0940 # avg LSQ occupancy (insn's)
lsq_rate                     1.7080 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9825 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291275082 # total number of slip cycles
avg_sim_slip                13.6242 # the average slip between issue and retirement
bpred_bimod.lookups         1654334 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482598 # total number of accesses
il1.hits                   21482419 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943840 # total number of accesses
dl1.hits                    4939894 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       2606 # total number of hits
ul2.misses                     2571 # total number of misses
ul2.replacements               1547 # total number of replacements
ul2.writebacks                  659 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4966 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2988 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1273 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482598 # total number of accesses
itlb.hits                  21482592 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884054 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:28:38 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861147 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37064650 # total simulation time in cycles
sim_IPC                      0.7517 # instructions per cycle
sim_CPI                      1.3304 # cycles per instruction
sim_exec_BW                  0.7517 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 148193473 # cumulative IFQ occupancy
IFQ_fcount                 37048128 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7517 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3190 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 592776234 # cumulative RUU occupancy
RUU_fcount                 37047414 # cumulative RUU full count
ruu_occupancy               15.9930 # avg RUU occupancy (insn's)
ruu_rate                     0.7517 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.2761 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 180169294 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8609 # avg LSQ occupancy (insn's)
lsq_rate                     0.7517 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4667 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  809454212 # total number of slip cycles
avg_sim_slip                29.0547 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6071212 # total number of hits
dl1.misses                  2582186 # total number of misses
dl1.replacements            2581674 # total number of replacements
dl1.writebacks               958427 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2984 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2983 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1108 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3540824 # total number of accesses
ul2.hits                    3472444 # total number of hits
ul2.misses                    68380 # total number of misses
ul2.replacements              67356 # total number of replacements
ul2.writebacks                22676 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0193 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0190 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0064 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017756 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:29:01 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148931 # total number of instructions executed
sim_total_refs              4034196 # total number of loads and stores executed
sim_total_loads             3020632 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8411532 # total simulation time in cycles
sim_IPC                      1.5582 # instructions per cycle
sim_CPI                      0.6418 # cycles per instruction
sim_exec_BW                  1.5632 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31522631 # cumulative IFQ occupancy
IFQ_fcount                  7732972 # cumulative IFQ full count
ifq_occupancy                3.7475 # avg IFQ occupancy (insn's)
ifq_rate                     1.5632 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3974 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9193 # fraction of time (cycle's) IFQ was full
RUU_count                 130432684 # cumulative RUU occupancy
RUU_fcount                  7051406 # cumulative RUU full count
ruu_occupancy               15.5064 # avg RUU occupancy (insn's)
ruu_rate                     1.5632 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9196 # avg RUU occupant latency (cycle's)
ruu_full                     0.8383 # fraction of time (cycle's) RUU was full
LSQ_count                  39719756 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7221 # avg LSQ occupancy (insn's)
lsq_rate                     1.5632 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0208 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  187193656 # total number of slip cycles
avg_sim_slip                14.2818 # the average slip between issue and retirement
bpred_bimod.lookups         1010976 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000660 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10190 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189437 # total number of accesses
il1.hits                   13189259 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013392 # total number of accesses
dl1.hits                    3667350 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     345692 # total number of hits
ul2.misses                     1160 # total number of misses
ul2.replacements                648 # total number of replacements
ul2.writebacks                  257 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0033 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0019 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0007 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189437 # total number of accesses
itlb.hits                  13189431 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013776 # total number of accesses
dtlb.hits                   4013740 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357244 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:29:10 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12265027 # total number of instructions executed
sim_total_refs              4824026 # total number of loads and stores executed
sim_total_loads             2865548 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3197013 # total number of branches executed
sim_cycle                   6439017 # total simulation time in cycles
sim_IPC                      1.7986 # instructions per cycle
sim_CPI                      0.5560 # cycles per instruction
sim_exec_BW                  1.9048 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19087907 # cumulative IFQ occupancy
IFQ_fcount                  3943734 # cumulative IFQ full count
ifq_occupancy                2.9644 # avg IFQ occupancy (insn's)
ifq_rate                     1.9048 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5563 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6125 # fraction of time (cycle's) IFQ was full
RUU_count                  78599418 # cumulative RUU occupancy
RUU_fcount                  3340770 # cumulative RUU full count
ruu_occupancy               12.2067 # avg RUU occupancy (insn's)
ruu_rate                     1.9048 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4084 # avg RUU occupant latency (cycle's)
ruu_full                     0.5188 # fraction of time (cycle's) RUU was full
LSQ_count                  33261666 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1656 # avg LSQ occupancy (insn's)
lsq_rate                     1.9048 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.7119 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  125894284 # total number of slip cycles
avg_sim_slip                10.8703 # the average slip between issue and retirement
bpred_bimod.lookups         3257741 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442015 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435454 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820386 # total number of accesses
il1.hits                   12820169 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497839 # total number of accesses
dl1.hits                    4480763 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      21515 # total number of hits
ul2.misses                     6137 # total number of misses
ul2.replacements               5625 # total number of replacements
ul2.writebacks                 3800 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2219 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2034 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1374 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820386 # total number of accesses
itlb.hits                  12820379 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918722 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:29:17 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375590 # total number of instructions executed
sim_total_refs              6748388 # total number of loads and stores executed
sim_total_loads             3824347 # total number of loads executed
sim_total_stores       2924041.0000 # total number of stores executed
sim_total_branches           390629 # total number of branches executed
sim_cycle                  11143274 # total simulation time in cycles
sim_IPC                      1.1954 # instructions per cycle
sim_CPI                      0.8365 # cycles per instruction
sim_exec_BW                  1.2003 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  43683813 # cumulative IFQ occupancy
IFQ_fcount                 10771214 # cumulative IFQ full count
ifq_occupancy                3.9202 # avg IFQ occupancy (insn's)
ifq_rate                     1.2003 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.2659 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9666 # fraction of time (cycle's) IFQ was full
RUU_count                 175659972 # cumulative RUU occupancy
RUU_fcount                 10621675 # cumulative RUU full count
ruu_occupancy               15.7638 # avg RUU occupancy (insn's)
ruu_rate                     1.2003 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 13.1329 # avg RUU occupant latency (cycle's)
ruu_full                     0.9532 # fraction of time (cycle's) RUU was full
LSQ_count                  92270308 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2804 # avg LSQ occupancy (insn's)
lsq_rate                     1.2003 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.8984 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  287645176 # total number of slip cycles
avg_sim_slip                21.5935 # the average slip between issue and retirement
bpred_bimod.lookups          390938 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400277 # total number of accesses
il1.hits                   13399531 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171900 # total number of accesses
dl1.hits                    5866728 # total number of hits
dl1.misses                   305172 # total number of misses
dl1.replacements             304660 # total number of replacements
dl1.writebacks               156825 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0494 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462743 # total number of accesses
ul2.hits                     386565 # total number of hits
ul2.misses                    76178 # total number of misses
ul2.replacements              75666 # total number of replacements
ul2.writebacks                62226 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1646 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1635 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1345 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400277 # total number of accesses
itlb.hits                  13400258 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767128 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:29:29 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449948 # total number of instructions executed
sim_total_refs              6588955 # total number of loads and stores executed
sim_total_loads             4938902 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12509880 # total simulation time in cycles
sim_IPC                      1.7090 # instructions per cycle
sim_CPI                      0.5851 # cycles per instruction
sim_exec_BW                  1.7146 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48909127 # cumulative IFQ occupancy
IFQ_fcount                 11607728 # cumulative IFQ full count
ifq_occupancy                3.9096 # avg IFQ occupancy (insn's)
ifq_rate                     1.7146 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2802 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 198767792 # cumulative RUU occupancy
RUU_fcount                 12385993 # cumulative RUU full count
ruu_occupancy               15.8889 # avg RUU occupancy (insn's)
ruu_rate                     1.7146 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2666 # avg RUU occupant latency (cycle's)
ruu_full                     0.9901 # fraction of time (cycle's) RUU was full
LSQ_count                  63739693 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0951 # avg LSQ occupancy (insn's)
lsq_rate                     1.7146 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9716 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290290813 # total number of slip cycles
avg_sim_slip                13.5782 # the average slip between issue and retirement
bpred_bimod.lookups         1654335 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           57 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482587 # total number of accesses
il1.hits                   21482408 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944348 # total number of accesses
dl1.hits                    4940402 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       3867 # total number of hits
ul2.misses                     1310 # total number of misses
ul2.replacements                798 # total number of replacements
ul2.writebacks                  344 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2530 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1541 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0664 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482587 # total number of accesses
itlb.hits                  21482581 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884014 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:29:43 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27860968 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203829 # total number of loads executed
sim_total_stores       1449774.0000 # total number of stores executed
sim_total_branches           481844 # total number of branches executed
sim_cycle                  36126960 # total simulation time in cycles
sim_IPC                      0.7712 # instructions per cycle
sim_CPI                      1.2967 # cycles per instruction
sim_exec_BW                  0.7712 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 144456591 # cumulative IFQ occupancy
IFQ_fcount                 36113909 # cumulative IFQ full count
ifq_occupancy                3.9986 # avg IFQ occupancy (insn's)
ifq_rate                     0.7712 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.1849 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 577828534 # cumulative RUU occupancy
RUU_fcount                 36113237 # cumulative RUU full count
ruu_occupancy               15.9944 # avg RUU occupancy (insn's)
ruu_rate                     0.7712 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.7397 # avg RUU occupant latency (cycle's)
ruu_full                     0.9996 # fraction of time (cycle's) RUU was full
LSQ_count                 175674670 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8627 # avg LSQ occupancy (insn's)
lsq_rate                     0.7712 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3054 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  790010670 # total number of slip cycles
avg_sim_slip                28.3568 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860788 # total number of accesses
il1.hits                   27860577 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6076269 # total number of hits
dl1.misses                  2577129 # total number of misses
dl1.replacements            2576617 # total number of replacements
dl1.writebacks               958457 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2978 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2978 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1108 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3535797 # total number of accesses
ul2.hits                    3500931 # total number of hits
ul2.misses                    34866 # total number of misses
ul2.replacements              34354 # total number of replacements
ul2.writebacks                11770 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0099 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0097 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0033 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860788 # total number of accesses
itlb.hits                  27860782 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017752 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:30:06 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148952 # total number of instructions executed
sim_total_refs              4034200 # total number of loads and stores executed
sim_total_loads             3020636 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010957 # total number of branches executed
sim_cycle                   8417009 # total simulation time in cycles
sim_IPC                      1.5572 # instructions per cycle
sim_CPI                      0.6422 # cycles per instruction
sim_exec_BW                  1.5622 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31534897 # cumulative IFQ occupancy
IFQ_fcount                  7736025 # cumulative IFQ full count
ifq_occupancy                3.7466 # avg IFQ occupancy (insn's)
ifq_rate                     1.5622 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3983 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9191 # fraction of time (cycle's) IFQ was full
RUU_count                 130488605 # cumulative RUU occupancy
RUU_fcount                  7054244 # cumulative RUU full count
ruu_occupancy               15.5030 # avg RUU occupancy (insn's)
ruu_rate                     1.5622 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9239 # avg RUU occupant latency (cycle's)
ruu_full                     0.8381 # fraction of time (cycle's) RUU was full
LSQ_count                  39739101 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7213 # avg LSQ occupancy (insn's)
lsq_rate                     1.5622 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0222 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  187264896 # total number of slip cycles
avg_sim_slip                14.2873 # the average slip between issue and retirement
bpred_bimod.lookups         1010980 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189458 # total number of accesses
il1.hits                   13189280 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013522 # total number of accesses
dl1.hits                    3667480 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346258 # total number of hits
ul2.misses                      594 # total number of misses
ul2.replacements                338 # total number of replacements
ul2.writebacks                  129 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0017 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0010 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0004 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189458 # total number of accesses
itlb.hits                  13189452 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013780 # total number of accesses
dtlb.hits                   4013744 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357336 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:30:15 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12265927 # total number of instructions executed
sim_total_refs              4824017 # total number of loads and stores executed
sim_total_loads             2865542 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197044 # total number of branches executed
sim_cycle                   6466102 # total simulation time in cycles
sim_IPC                      1.7911 # instructions per cycle
sim_CPI                      0.5583 # cycles per instruction
sim_exec_BW                  1.8970 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19180596 # cumulative IFQ occupancy
IFQ_fcount                  3966860 # cumulative IFQ full count
ifq_occupancy                2.9663 # avg IFQ occupancy (insn's)
ifq_rate                     1.8970 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5637 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6135 # fraction of time (cycle's) IFQ was full
RUU_count                  78972199 # cumulative RUU occupancy
RUU_fcount                  3363160 # cumulative RUU full count
ruu_occupancy               12.2133 # avg RUU occupancy (insn's)
ruu_rate                     1.8970 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4383 # avg RUU occupant latency (cycle's)
ruu_full                     0.5201 # fraction of time (cycle's) RUU was full
LSQ_count                  33424732 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1692 # avg LSQ occupancy (insn's)
lsq_rate                     1.8970 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.7250 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  126418001 # total number of slip cycles
avg_sim_slip                10.9155 # the average slip between issue and retirement
bpred_bimod.lookups         3257770 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984764 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442047 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435453 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820430 # total number of accesses
il1.hits                   12820213 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497839 # total number of accesses
dl1.hits                    4480763 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      24532 # total number of hits
ul2.misses                     3120 # total number of misses
ul2.replacements               2864 # total number of replacements
ul2.writebacks                 1917 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1128 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1036 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0693 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820430 # total number of accesses
itlb.hits                  12820423 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918888 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:30:23 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1024685.2308 # simulation speed (in insts/sec)
sim_total_insn             13376421 # total number of instructions executed
sim_total_refs              6748395 # total number of loads and stores executed
sim_total_loads             3824348 # total number of loads executed
sim_total_stores       2924047.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  13643878 # total simulation time in cycles
sim_IPC                      0.9763 # instructions per cycle
sim_CPI                      1.0242 # cycles per instruction
sim_exec_BW                  0.9804 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  53660822 # cumulative IFQ occupancy
IFQ_fcount                 13265283 # cumulative IFQ full count
ifq_occupancy                3.9330 # avg IFQ occupancy (insn's)
ifq_rate                     0.9804 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  4.0116 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9723 # fraction of time (cycle's) IFQ was full
RUU_count                 215576224 # cumulative RUU occupancy
RUU_fcount                 13115839 # cumulative RUU full count
ruu_occupancy               15.8002 # avg RUU occupancy (insn's)
ruu_rate                     0.9804 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 16.1161 # avg RUU occupant latency (cycle's)
ruu_full                     0.9613 # fraction of time (cycle's) RUU was full
LSQ_count                 114129044 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3649 # avg LSQ occupancy (insn's)
lsq_rate                     0.9804 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  8.5321 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  349415388 # total number of slip cycles
avg_sim_slip                26.2306 # the average slip between issue and retirement
bpred_bimod.lookups          390940 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400289 # total number of accesses
il1.hits                   13399543 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171900 # total number of accesses
dl1.hits                    5866897 # total number of hits
dl1.misses                   305003 # total number of misses
dl1.replacements             304491 # total number of replacements
dl1.writebacks               156826 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462575 # total number of accesses
ul2.hits                     394980 # total number of hits
ul2.misses                    67595 # total number of misses
ul2.replacements              67339 # total number of replacements
ul2.writebacks                57336 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1461 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1456 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1239 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400289 # total number of accesses
itlb.hits                  13400270 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740999 # total number of accesses
dtlb.hits                   6736825 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767178 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:30:36 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449685 # total number of instructions executed
sim_total_refs              6588961 # total number of loads and stores executed
sim_total_loads             4938904 # total number of loads executed
sim_total_stores       1650057.0000 # total number of stores executed
sim_total_branches          1647450 # total number of branches executed
sim_cycle                  12519471 # total simulation time in cycles
sim_IPC                      1.7077 # instructions per cycle
sim_CPI                      0.5856 # cycles per instruction
sim_exec_BW                  1.7133 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48937610 # cumulative IFQ occupancy
IFQ_fcount                 11614653 # cumulative IFQ full count
ifq_occupancy                3.9089 # avg IFQ occupancy (insn's)
ifq_rate                     1.7133 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2815 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9277 # fraction of time (cycle's) IFQ was full
RUU_count                 198879203 # cumulative RUU occupancy
RUU_fcount                 12392221 # cumulative RUU full count
ruu_occupancy               15.8856 # avg RUU occupancy (insn's)
ruu_rate                     1.7133 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2719 # avg RUU occupant latency (cycle's)
ruu_full                     0.9898 # fraction of time (cycle's) RUU was full
LSQ_count                  63775997 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0941 # avg LSQ occupancy (insn's)
lsq_rate                     1.7133 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9733 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290435361 # total number of slip cycles
avg_sim_slip                13.5849 # the average slip between issue and retirement
bpred_bimod.lookups         1654339 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482601 # total number of accesses
il1.hits                   21482422 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944606 # total number of accesses
dl1.hits                    4940660 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       4505 # total number of hits
ul2.misses                      672 # total number of misses
ul2.replacements                416 # total number of replacements
ul2.writebacks                  183 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1298 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0804 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0353 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482601 # total number of accesses
itlb.hits                  21482595 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565560 # total number of accesses
dtlb.hits                   6565521 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884074 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:30:50 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861819 # total number of instructions executed
sim_total_refs              8653612 # total number of loads and stores executed
sim_total_loads             7203835 # total number of loads executed
sim_total_stores       1449777.0000 # total number of stores executed
sim_total_branches           481852 # total number of branches executed
sim_cycle                  36556766 # total simulation time in cycles
sim_IPC                      0.7621 # instructions per cycle
sim_CPI                      1.3122 # cycles per instruction
sim_exec_BW                  0.7622 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 146162958 # cumulative IFQ occupancy
IFQ_fcount                 36540397 # cumulative IFQ full count
ifq_occupancy                3.9982 # avg IFQ occupancy (insn's)
ifq_rate                     0.7622 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2460 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 584654837 # cumulative RUU occupancy
RUU_fcount                 36539655 # cumulative RUU full count
ruu_occupancy               15.9931 # avg RUU occupancy (insn's)
ruu_rate                     0.7622 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.9841 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 176363233 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8244 # avg LSQ occupancy (insn's)
lsq_rate                     0.7622 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3299 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  797521877 # total number of slip cycles
avg_sim_slip                28.6264 # the average slip between issue and retirement
bpred_bimod.lookups          481892 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          123 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860810 # total number of accesses
il1.hits                   27860599 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653401 # total number of accesses
dl1.hits                    5938859 # total number of hits
dl1.misses                  2714542 # total number of misses
dl1.replacements            2714030 # total number of replacements
dl1.writebacks               956651 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3137 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3136 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3671404 # total number of accesses
ul2.hits                    3653299 # total number of hits
ul2.misses                    18105 # total number of misses
ul2.replacements              17849 # total number of replacements
ul2.writebacks                 6372 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0049 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0049 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0017 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860810 # total number of accesses
itlb.hits                  27860804 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653414 # total number of accesses
dtlb.hits                   8652341 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017852 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:31:14 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8421579 # total simulation time in cycles
sim_IPC                      1.5564 # instructions per cycle
sim_CPI                      0.6425 # cycles per instruction
sim_exec_BW                  1.5613 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31538134 # cumulative IFQ occupancy
IFQ_fcount                  7736837 # cumulative IFQ full count
ifq_occupancy                3.7449 # avg IFQ occupancy (insn's)
ifq_rate                     1.5613 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3985 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9187 # fraction of time (cycle's) IFQ was full
RUU_count                 130503353 # cumulative RUU occupancy
RUU_fcount                  7054965 # cumulative RUU full count
ruu_occupancy               15.4963 # avg RUU occupancy (insn's)
ruu_rate                     1.5613 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9250 # avg RUU occupant latency (cycle's)
ruu_full                     0.8377 # fraction of time (cycle's) RUU was full
LSQ_count                  39743096 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7192 # avg LSQ occupancy (insn's)
lsq_rate                     1.5613 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0225 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  187289338 # total number of slip cycles
avg_sim_slip                14.2891 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189425 # total number of accesses
il1.hits                   13189247 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013580 # total number of accesses
dl1.hits                    3667538 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346541 # total number of hits
ul2.misses                      311 # total number of misses
ul2.replacements                183 # total number of replacements
ul2.writebacks                   66 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0009 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0005 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189425 # total number of accesses
itlb.hits                  13189419 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357194 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:31:23 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12267285 # total number of instructions executed
sim_total_refs              4824020 # total number of loads and stores executed
sim_total_loads             2865549 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197057 # total number of branches executed
sim_cycle                   6489505 # total simulation time in cycles
sim_IPC                      1.7847 # instructions per cycle
sim_CPI                      0.5603 # cycles per instruction
sim_exec_BW                  1.8903 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19248317 # cumulative IFQ occupancy
IFQ_fcount                  3983766 # cumulative IFQ full count
ifq_occupancy                2.9661 # avg IFQ occupancy (insn's)
ifq_rate                     1.8903 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5691 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6139 # fraction of time (cycle's) IFQ was full
RUU_count                  79245064 # cumulative RUU occupancy
RUU_fcount                  3379482 # cumulative RUU full count
ruu_occupancy               12.2113 # avg RUU occupancy (insn's)
ruu_rate                     1.8903 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4599 # avg RUU occupant latency (cycle's)
ruu_full                     0.5208 # fraction of time (cycle's) RUU was full
LSQ_count                  33560019 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1714 # avg LSQ occupancy (insn's)
lsq_rate                     1.8903 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.7357 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  126804729 # total number of slip cycles
avg_sim_slip                10.9489 # the average slip between issue and retirement
bpred_bimod.lookups         3257785 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442061 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820463 # total number of accesses
il1.hits                   12820246 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497830 # total number of accesses
dl1.hits                    4480754 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      26039 # total number of hits
ul2.misses                     1613 # total number of misses
ul2.replacements               1485 # total number of replacements
ul2.writebacks                  978 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0583 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0537 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0354 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820463 # total number of accesses
itlb.hits                  12820456 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514858 # total number of accesses
dtlb.hits                   4514792 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919034 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:31:31 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 15 # total simulation time in seconds
sim_inst_rate           888060.5333 # simulation speed (in insts/sec)
sim_total_insn             13378402 # total number of instructions executed
sim_total_refs              6748388 # total number of loads and stores executed
sim_total_loads             3824348 # total number of loads executed
sim_total_stores       2924040.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  18758664 # total simulation time in cycles
sim_IPC                      0.7101 # instructions per cycle
sim_CPI                      1.4082 # cycles per instruction
sim_exec_BW                  0.7132 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  74077867 # cumulative IFQ occupancy
IFQ_fcount                 18369326 # cumulative IFQ full count
ifq_occupancy                3.9490 # avg IFQ occupancy (insn's)
ifq_rate                     0.7132 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5371 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9792 # fraction of time (cycle's) IFQ was full
RUU_count                 297256361 # cumulative RUU occupancy
RUU_fcount                 18219628 # cumulative RUU full count
ruu_occupancy               15.8464 # avg RUU occupancy (insn's)
ruu_rate                     0.7132 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.2191 # avg RUU occupant latency (cycle's)
ruu_full                     0.9713 # fraction of time (cycle's) RUU was full
LSQ_count                 158892866 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.4704 # avg LSQ occupancy (insn's)
lsq_rate                     0.7132 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 11.8768 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  475860723 # total number of slip cycles
avg_sim_slip                35.7228 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400270 # total number of accesses
il1.hits                   13399525 # total number of hits
il1.misses                      745 # total number of misses
il1.replacements                249 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171935 # total number of accesses
dl1.hits                    5867003 # total number of hits
dl1.misses                   304932 # total number of misses
dl1.replacements             304420 # total number of replacements
dl1.writebacks               156826 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462503 # total number of accesses
ul2.hits                     396143 # total number of hits
ul2.misses                    66360 # total number of misses
ul2.replacements              66232 # total number of replacements
ul2.writebacks                55963 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1435 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1432 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1210 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400270 # total number of accesses
itlb.hits                  13400251 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767102 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:31:46 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449676 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938903 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12527082 # total simulation time in cycles
sim_IPC                      1.7066 # instructions per cycle
sim_CPI                      0.5859 # cycles per instruction
sim_exec_BW                  1.7123 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48953188 # cumulative IFQ occupancy
IFQ_fcount                 11618453 # cumulative IFQ full count
ifq_occupancy                3.9078 # avg IFQ occupancy (insn's)
ifq_rate                     1.7123 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2822 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9275 # fraction of time (cycle's) IFQ was full
RUU_count                 198941003 # cumulative RUU occupancy
RUU_fcount                 12395639 # cumulative RUU full count
ruu_occupancy               15.8809 # avg RUU occupancy (insn's)
ruu_rate                     1.7123 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2748 # avg RUU occupant latency (cycle's)
ruu_full                     0.9895 # fraction of time (cycle's) RUU was full
LSQ_count                  63795155 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0926 # avg LSQ occupancy (insn's)
lsq_rate                     1.7123 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9742 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290520135 # total number of slip cycles
avg_sim_slip                13.5889 # the average slip between issue and retirement
bpred_bimod.lookups         1654336 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482586 # total number of accesses
il1.hits                   21482407 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944732 # total number of accesses
dl1.hits                    4940786 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       4824 # total number of hits
ul2.misses                      353 # total number of misses
ul2.replacements                225 # total number of replacements
ul2.writebacks                  100 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0682 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0435 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0193 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482586 # total number of accesses
itlb.hits                  21482580 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884012 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:32:00 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27863099 # total number of instructions executed
sim_total_refs              8653604 # total number of loads and stores executed
sim_total_loads             7203832 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  36853327 # total simulation time in cycles
sim_IPC                      0.7560 # instructions per cycle
sim_CPI                      1.3228 # cycles per instruction
sim_exec_BW                  0.7561 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 147330792 # cumulative IFQ occupancy
IFQ_fcount                 36832457 # cumulative IFQ full count
ifq_occupancy                3.9978 # avg IFQ occupancy (insn's)
ifq_rate                     0.7561 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2877 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 589324387 # cumulative RUU occupancy
RUU_fcount                 36831270 # cumulative RUU full count
ruu_occupancy               15.9911 # avg RUU occupancy (insn's)
ruu_rate                     0.7561 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.1507 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 177115082 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8059 # avg LSQ occupancy (insn's)
lsq_rate                     0.7561 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3566 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  802948058 # total number of slip cycles
avg_sim_slip                28.8211 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860786 # total number of accesses
il1.hits                   27860575 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5867867 # total number of hits
dl1.misses                  2785531 # total number of misses
dl1.replacements            2785019 # total number of replacements
dl1.writebacks               955704 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3219 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3218 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3741446 # total number of accesses
ul2.hits                    3731754 # total number of hits
ul2.misses                     9692 # total number of misses
ul2.replacements               9564 # total number of replacements
ul2.writebacks                 3646 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0026 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0026 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0010 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860786 # total number of accesses
itlb.hits                  27860780 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017750 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:32:23 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8437429 # total simulation time in cycles
sim_IPC                      1.5534 # instructions per cycle
sim_CPI                      0.6437 # cycles per instruction
sim_exec_BW                  1.5584 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31585207 # cumulative IFQ occupancy
IFQ_fcount                  7748613 # cumulative IFQ full count
ifq_occupancy                3.7435 # avg IFQ occupancy (insn's)
ifq_rate                     1.5584 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4021 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9184 # fraction of time (cycle's) IFQ was full
RUU_count                 130698665 # cumulative RUU occupancy
RUU_fcount                  7066666 # cumulative RUU full count
ruu_occupancy               15.4903 # avg RUU occupancy (insn's)
ruu_rate                     1.5584 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9399 # avg RUU occupant latency (cycle's)
ruu_full                     0.8375 # fraction of time (cycle's) RUU was full
LSQ_count                  39805109 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7177 # avg LSQ occupancy (insn's)
lsq_rate                     1.5584 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0273 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  187546663 # total number of slip cycles
avg_sim_slip                14.3088 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189425 # total number of accesses
il1.hits                   13189247 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013611 # total number of accesses
dl1.hits                    3667569 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346687 # total number of hits
ul2.misses                      165 # total number of misses
ul2.replacements                101 # total number of replacements
ul2.writebacks                   34 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0005 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0003 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189425 # total number of accesses
itlb.hits                  13189419 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357194 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:32:31 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1286834.7778 # simulation speed (in insts/sec)
sim_total_insn             12267461 # total number of instructions executed
sim_total_refs              4824020 # total number of loads and stores executed
sim_total_loads             2865549 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197065 # total number of branches executed
sim_cycle                   6516758 # total simulation time in cycles
sim_IPC                      1.7772 # instructions per cycle
sim_CPI                      0.5627 # cycles per instruction
sim_exec_BW                  1.8824 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19331655 # cumulative IFQ occupancy
IFQ_fcount                  4004591 # cumulative IFQ full count
ifq_occupancy                2.9665 # avg IFQ occupancy (insn's)
ifq_rate                     1.8824 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5758 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6145 # fraction of time (cycle's) IFQ was full
RUU_count                  79579703 # cumulative RUU occupancy
RUU_fcount                  3400132 # cumulative RUU full count
ruu_occupancy               12.2115 # avg RUU occupancy (insn's)
ruu_rate                     1.8824 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.4871 # avg RUU occupant latency (cycle's)
ruu_full                     0.5218 # fraction of time (cycle's) RUU was full
LSQ_count                  33749868 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1789 # avg LSQ occupancy (insn's)
lsq_rate                     1.8824 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.7512 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  127284745 # total number of slip cycles
avg_sim_slip                10.9903 # the average slip between issue and retirement
bpred_bimod.lookups         3257793 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442067 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820477 # total number of accesses
il1.hits                   12820260 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497830 # total number of accesses
dl1.hits                    4480754 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      26806 # total number of hits
ul2.misses                      846 # total number of misses
ul2.replacements                782 # total number of replacements
ul2.writebacks                  506 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0306 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0283 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0183 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820477 # total number of accesses
itlb.hits                  12820470 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514858 # total number of accesses
dtlb.hits                   4514792 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919090 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:32:40 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 20 # total simulation time in seconds
sim_inst_rate           666045.4000 # simulation speed (in insts/sec)
sim_total_insn             13377312 # total number of instructions executed
sim_total_refs              6748391 # total number of loads and stores executed
sim_total_loads             3824351 # total number of loads executed
sim_total_stores       2924040.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  29339098 # total simulation time in cycles
sim_IPC                      0.4540 # instructions per cycle
sim_CPI                      2.2025 # cycles per instruction
sim_exec_BW                  0.4560 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                 116341810 # cumulative IFQ occupancy
IFQ_fcount                 28935195 # cumulative IFQ full count
ifq_occupancy                3.9654 # avg IFQ occupancy (insn's)
ifq_rate                     0.4560 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  8.6969 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9862 # fraction of time (cycle's) IFQ was full
RUU_count                 466328738 # cumulative RUU occupancy
RUU_fcount                 28785834 # cumulative RUU full count
ruu_occupancy               15.8944 # avg RUU occupancy (insn's)
ruu_rate                     0.4560 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 34.8597 # avg RUU occupant latency (cycle's)
ruu_full                     0.9811 # fraction of time (cycle's) RUU was full
LSQ_count                 251747887 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.5806 # avg LSQ occupancy (insn's)
lsq_rate                     0.4560 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 18.8190 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  737777387 # total number of slip cycles
avg_sim_slip                55.3849 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400279 # total number of accesses
il1.hits                   13399534 # total number of hits
il1.misses                      745 # total number of misses
il1.replacements                249 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171957 # total number of accesses
dl1.hits                    5867042 # total number of hits
dl1.misses                   304915 # total number of misses
dl1.replacements             304403 # total number of replacements
dl1.writebacks               156830 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462490 # total number of accesses
ul2.hits                     396653 # total number of hits
ul2.misses                    65837 # total number of misses
ul2.replacements              65773 # total number of replacements
ul2.writebacks                55776 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1424 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1422 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1206 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400279 # total number of accesses
itlb.hits                  13400260 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767144 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:33:00 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449664 # total number of instructions executed
sim_total_refs              6588951 # total number of loads and stores executed
sim_total_loads             4938902 # total number of loads executed
sim_total_stores       1650049.0000 # total number of stores executed
sim_total_branches          1647445 # total number of branches executed
sim_cycle                  12536944 # total simulation time in cycles
sim_IPC                      1.7053 # instructions per cycle
sim_CPI                      0.5864 # cycles per instruction
sim_exec_BW                  1.7109 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48976297 # cumulative IFQ occupancy
IFQ_fcount                 11624183 # cumulative IFQ full count
ifq_occupancy                3.9066 # avg IFQ occupancy (insn's)
ifq_rate                     1.7109 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2833 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9272 # fraction of time (cycle's) IFQ was full
RUU_count                 199034220 # cumulative RUU occupancy
RUU_fcount                 12401189 # cumulative RUU full count
ruu_occupancy               15.8758 # avg RUU occupancy (insn's)
ruu_rate                     1.7109 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2791 # avg RUU occupant latency (cycle's)
ruu_full                     0.9892 # fraction of time (cycle's) RUU was full
LSQ_count                  63831691 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0915 # avg LSQ occupancy (insn's)
lsq_rate                     1.7109 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9759 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290649957 # total number of slip cycles
avg_sim_slip                13.5950 # the average slip between issue and retirement
bpred_bimod.lookups         1654333 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           77 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482573 # total number of accesses
il1.hits                   21482394 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944795 # total number of accesses
dl1.hits                    4940849 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       4984 # total number of hits
ul2.misses                      193 # total number of misses
ul2.replacements                129 # total number of replacements
ul2.writebacks                   55 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0373 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0249 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0106 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482573 # total number of accesses
itlb.hits                  21482567 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565557 # total number of accesses
dtlb.hits                   6565518 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178883958 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:33:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27863245 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203831 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481846 # total number of branches executed
sim_cycle                  37172803 # total simulation time in cycles
sim_IPC                      0.7495 # instructions per cycle
sim_CPI                      1.3343 # cycles per instruction
sim_exec_BW                  0.7496 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 148589686 # cumulative IFQ occupancy
IFQ_fcount                 37147172 # cumulative IFQ full count
ifq_occupancy                3.9973 # avg IFQ occupancy (insn's)
ifq_rate                     0.7496 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3328 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9993 # fraction of time (cycle's) IFQ was full
RUU_count                 594360107 # cumulative RUU occupancy
RUU_fcount                 37145956 # cumulative RUU full count
ruu_occupancy               15.9891 # avg RUU occupancy (insn's)
ruu_rate                     0.7496 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.3313 # avg RUU occupant latency (cycle's)
ruu_full                     0.9993 # fraction of time (cycle's) RUU was full
LSQ_count                 178367588 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7983 # avg LSQ occupancy (insn's)
lsq_rate                     0.7496 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4015 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  809235961 # total number of slip cycles
avg_sim_slip                29.0468 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860784 # total number of accesses
il1.hits                   27860573 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5833046 # total number of hits
dl1.misses                  2820352 # total number of misses
dl1.replacements            2819840 # total number of replacements
dl1.writebacks               955247 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3259 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3259 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3775810 # total number of accesses
ul2.hits                    3770326 # total number of hits
ul2.misses                     5484 # total number of misses
ul2.replacements               5420 # total number of replacements
ul2.writebacks                 2278 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0015 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0014 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0006 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860784 # total number of accesses
itlb.hits                  27860778 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017740 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:33:37 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8443632 # total simulation time in cycles
sim_IPC                      1.5523 # instructions per cycle
sim_CPI                      0.6442 # cycles per instruction
sim_exec_BW                  1.5573 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31596419 # cumulative IFQ occupancy
IFQ_fcount                  7751413 # cumulative IFQ full count
ifq_occupancy                3.7420 # avg IFQ occupancy (insn's)
ifq_rate                     1.5573 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4030 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9180 # fraction of time (cycle's) IFQ was full
RUU_count                 130742420 # cumulative RUU occupancy
RUU_fcount                  7069443 # cumulative RUU full count
ruu_occupancy               15.4841 # avg RUU occupancy (insn's)
ruu_rate                     1.5573 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9432 # avg RUU occupant latency (cycle's)
ruu_full                     0.8373 # fraction of time (cycle's) RUU was full
LSQ_count                  39820114 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7160 # avg LSQ occupancy (insn's)
lsq_rate                     1.5573 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0284 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  187605413 # total number of slip cycles
avg_sim_slip                14.3132 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189424 # total number of accesses
il1.hits                   13189246 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013627 # total number of accesses
dl1.hits                    3667585 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346764 # total number of hits
ul2.misses                       88 # total number of misses
ul2.replacements                 56 # total number of replacements
ul2.writebacks                   18 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0003 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0002 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189424 # total number of accesses
itlb.hits                  13189418 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357190 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:33:45 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12270089 # total number of instructions executed
sim_total_refs              4824025 # total number of loads and stores executed
sim_total_loads             2865550 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197074 # total number of branches executed
sim_cycle                   6552520 # total simulation time in cycles
sim_IPC                      1.7675 # instructions per cycle
sim_CPI                      0.5658 # cycles per instruction
sim_exec_BW                  1.8726 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19442890 # cumulative IFQ occupancy
IFQ_fcount                  4032390 # cumulative IFQ full count
ifq_occupancy                2.9672 # avg IFQ occupancy (insn's)
ifq_rate                     1.8726 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5846 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6154 # fraction of time (cycle's) IFQ was full
RUU_count                  80016530 # cumulative RUU occupancy
RUU_fcount                  3427205 # cumulative RUU full count
ruu_occupancy               12.2116 # avg RUU occupancy (insn's)
ruu_rate                     1.8726 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.5213 # avg RUU occupant latency (cycle's)
ruu_full                     0.5230 # fraction of time (cycle's) RUU was full
LSQ_count                  33983020 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.1863 # avg LSQ occupancy (insn's)
lsq_rate                     1.8726 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.7696 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  127832434 # total number of slip cycles
avg_sim_slip                11.0376 # the average slip between issue and retirement
bpred_bimod.lookups         3257802 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442075 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820501 # total number of accesses
il1.hits                   12820284 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497831 # total number of accesses
dl1.hits                    4480755 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      27193 # total number of hits
ul2.misses                      459 # total number of misses
ul2.replacements                427 # total number of replacements
ul2.writebacks                  265 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0166 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0154 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0096 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820501 # total number of accesses
itlb.hits                  12820494 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919188 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:33:53 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 28 # total simulation time in seconds
sim_inst_rate           475746.7143 # simulation speed (in insts/sec)
sim_total_insn             13379910 # total number of instructions executed
sim_total_refs              6748394 # total number of loads and stores executed
sim_total_loads             3824351 # total number of loads executed
sim_total_stores       2924043.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  52485620 # total simulation time in cycles
sim_IPC                      0.2538 # instructions per cycle
sim_CPI                      3.9401 # cycles per instruction
sim_exec_BW                  0.2549 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                 208810867 # cumulative IFQ occupancy
IFQ_fcount                 52052915 # cumulative IFQ full count
ifq_occupancy                3.9784 # avg IFQ occupancy (insn's)
ifq_rate                     0.2549 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                 15.6063 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9918 # fraction of time (cycle's) IFQ was full
RUU_count                 836281902 # cumulative RUU occupancy
RUU_fcount                 51902314 # cumulative RUU full count
ruu_occupancy               15.9335 # avg RUU occupancy (insn's)
ruu_rate                     0.2549 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 62.5028 # avg RUU occupant latency (cycle's)
ruu_full                     0.9889 # fraction of time (cycle's) RUU was full
LSQ_count                 455231455 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.6735 # avg LSQ occupancy (insn's)
lsq_rate                     0.2549 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 34.0235 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                 1311191468 # total number of slip cycles
avg_sim_slip                98.4311 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400286 # total number of accesses
il1.hits                   13399540 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6172114 # total number of accesses
dl1.hits                    5867188 # total number of hits
dl1.misses                   304926 # total number of misses
dl1.replacements             304414 # total number of replacements
dl1.writebacks               156830 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462502 # total number of accesses
ul2.hits                     394305 # total number of hits
ul2.misses                    68197 # total number of misses
ul2.replacements              68165 # total number of replacements
ul2.writebacks                57465 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1475 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1474 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1242 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400286 # total number of accesses
itlb.hits                  13400267 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767172 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:34:21 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449676 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938903 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12544482 # total simulation time in cycles
sim_IPC                      1.7043 # instructions per cycle
sim_CPI                      0.5868 # cycles per instruction
sim_exec_BW                  1.7099 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48992924 # cumulative IFQ occupancy
IFQ_fcount                 11628315 # cumulative IFQ full count
ifq_occupancy                3.9055 # avg IFQ occupancy (insn's)
ifq_rate                     1.7099 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2841 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9270 # fraction of time (cycle's) IFQ was full
RUU_count                 199092592 # cumulative RUU occupancy
RUU_fcount                 12405225 # cumulative RUU full count
ruu_occupancy               15.8709 # avg RUU occupancy (insn's)
ruu_rate                     1.7099 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2818 # avg RUU occupant latency (cycle's)
ruu_full                     0.9889 # fraction of time (cycle's) RUU was full
LSQ_count                  63851376 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0900 # avg LSQ occupancy (insn's)
lsq_rate                     1.7099 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9768 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290727945 # total number of slip cycles
avg_sim_slip                13.5986 # the average slip between issue and retirement
bpred_bimod.lookups         1654336 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482585 # total number of accesses
il1.hits                   21482406 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944828 # total number of accesses
dl1.hits                    4940882 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       5072 # total number of hits
ul2.misses                      105 # total number of misses
ul2.replacements                 73 # total number of replacements
ul2.writebacks                   31 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0203 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0141 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0060 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482585 # total number of accesses
itlb.hits                  21482579 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884008 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:34:35 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27860701 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203831 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481846 # total number of branches executed
sim_cycle                  37664955 # total simulation time in cycles
sim_IPC                      0.7397 # instructions per cycle
sim_CPI                      1.3520 # cycles per instruction
sim_exec_BW                  0.7397 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150539526 # cumulative IFQ occupancy
IFQ_fcount                 37634632 # cumulative IFQ full count
ifq_occupancy                3.9968 # avg IFQ occupancy (insn's)
ifq_rate                     0.7397 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4033 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9992 # fraction of time (cycle's) IFQ was full
RUU_count                 602159969 # cumulative RUU occupancy
RUU_fcount                 37634064 # cumulative RUU full count
ruu_occupancy               15.9873 # avg RUU occupancy (insn's)
ruu_rate                     0.7397 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.6132 # avg RUU occupant latency (cycle's)
ruu_full                     0.9992 # fraction of time (cycle's) RUU was full
LSQ_count                 180776746 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7996 # avg LSQ occupancy (insn's)
lsq_rate                     0.7397 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.4886 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  819445617 # total number of slip cycles
avg_sim_slip                29.4133 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860784 # total number of accesses
il1.hits                   27860573 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5817932 # total number of hits
dl1.misses                  2835466 # total number of misses
dl1.replacements            2834954 # total number of replacements
dl1.writebacks               955047 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3277 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3276 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3790724 # total number of accesses
ul2.hits                    3787350 # total number of hits
ul2.misses                     3374 # total number of misses
ul2.replacements               3342 # total number of replacements
ul2.writebacks                 1593 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0009 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0009 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0004 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860784 # total number of accesses
itlb.hits                  27860778 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017740 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:34:58 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8447282 # total simulation time in cycles
sim_IPC                      1.5516 # instructions per cycle
sim_CPI                      0.6445 # cycles per instruction
sim_exec_BW                  1.5566 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31600261 # cumulative IFQ occupancy
IFQ_fcount                  7752372 # cumulative IFQ full count
ifq_occupancy                3.7409 # avg IFQ occupancy (insn's)
ifq_rate                     1.5566 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4033 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9177 # fraction of time (cycle's) IFQ was full
RUU_count                 130772375 # cumulative RUU occupancy
RUU_fcount                  7070400 # cumulative RUU full count
ruu_occupancy               15.4810 # avg RUU occupancy (insn's)
ruu_rate                     1.5566 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9455 # avg RUU occupant latency (cycle's)
ruu_full                     0.8370 # fraction of time (cycle's) RUU was full
LSQ_count                  39829853 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7151 # avg LSQ occupancy (insn's)
lsq_rate                     1.5566 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0291 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  187645112 # total number of slip cycles
avg_sim_slip                14.3163 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189424 # total number of accesses
il1.hits                   13189246 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013634 # total number of accesses
dl1.hits                    3667592 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346805 # total number of hits
ul2.misses                       47 # total number of misses
ul2.replacements                 31 # total number of replacements
ul2.writebacks                   10 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0001 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0001 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189424 # total number of accesses
itlb.hits                  13189418 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357190 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:35:07 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12295826 # total number of instructions executed
sim_total_refs              4824025 # total number of loads and stores executed
sim_total_loads             2865550 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197078 # total number of branches executed
sim_cycle                   6670591 # total simulation time in cycles
sim_IPC                      1.7362 # instructions per cycle
sim_CPI                      0.5760 # cycles per instruction
sim_exec_BW                  1.8433 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19873214 # cumulative IFQ occupancy
IFQ_fcount                  4139967 # cumulative IFQ full count
ifq_occupancy                2.9792 # avg IFQ occupancy (insn's)
ifq_rate                     1.8433 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.6163 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6206 # fraction of time (cycle's) IFQ was full
RUU_count                  81701239 # cumulative RUU occupancy
RUU_fcount                  3528325 # cumulative RUU full count
ruu_occupancy               12.2480 # avg RUU occupancy (insn's)
ruu_rate                     1.8433 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.6446 # avg RUU occupant latency (cycle's)
ruu_full                     0.5289 # fraction of time (cycle's) RUU was full
LSQ_count                  34902494 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.2323 # avg LSQ occupancy (insn's)
lsq_rate                     1.8433 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.8386 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  129353307 # total number of slip cycles
avg_sim_slip                11.1689 # the average slip between issue and retirement
bpred_bimod.lookups         3257805 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442078 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820505 # total number of accesses
il1.hits                   12820288 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497831 # total number of accesses
dl1.hits                    4480755 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      27337 # total number of hits
ul2.misses                      315 # total number of misses
ul2.replacements                299 # total number of replacements
ul2.writebacks                  174 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0114 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0108 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0063 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820505 # total number of accesses
itlb.hits                  12820498 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919204 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:35:14 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 52 # total simulation time in seconds
sim_inst_rate           256171.3077 # simulation speed (in insts/sec)
sim_total_insn             13384849 # total number of instructions executed
sim_total_refs              6748283 # total number of loads and stores executed
sim_total_loads             3824239 # total number of loads executed
sim_total_stores       2924044.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                 109693077 # total simulation time in cycles
sim_IPC                      0.1214 # instructions per cycle
sim_CPI                      8.2347 # cycles per instruction
sim_exec_BW                  0.1220 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                 437470787 # cumulative IFQ occupancy
IFQ_fcount                109217377 # cumulative IFQ full count
ifq_occupancy                3.9881 # avg IFQ occupancy (insn's)
ifq_rate                     0.1220 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                 32.6840 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9957 # fraction of time (cycle's) IFQ was full
RUU_count                1751064879 # cumulative RUU occupancy
RUU_fcount                109061679 # cumulative RUU full count
ruu_occupancy               15.9633 # avg RUU occupancy (insn's)
ruu_rate                     0.1220 # avg RUU dispatch rate (insn/cycle)
ruu_latency                130.8244 # avg RUU occupant latency (cycle's)
ruu_full                     0.9942 # fraction of time (cycle's) RUU was full
LSQ_count                 953385534 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.6914 # avg LSQ occupancy (insn's)
lsq_rate                     0.1220 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 71.2287 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                 2724072064 # total number of slip cycles
avg_sim_slip               204.4960 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400044 # total number of accesses
il1.hits                   13399298 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6172430 # total number of accesses
dl1.hits                    5867632 # total number of hits
dl1.misses                   304798 # total number of misses
dl1.replacements             304286 # total number of replacements
dl1.writebacks               156804 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462348 # total number of accesses
ul2.hits                     392839 # total number of hits
ul2.misses                    69509 # total number of misses
ul2.replacements              69493 # total number of replacements
ul2.writebacks                55871 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1503 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1503 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1208 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400044 # total number of accesses
itlb.hits                  13400025 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156765868 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:36:06 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449676 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938903 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12560911 # total simulation time in cycles
sim_IPC                      1.7020 # instructions per cycle
sim_CPI                      0.5875 # cycles per instruction
sim_exec_BW                  1.7077 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49027278 # cumulative IFQ occupancy
IFQ_fcount                 11636892 # cumulative IFQ full count
ifq_occupancy                3.9032 # avg IFQ occupancy (insn's)
ifq_rate                     1.7077 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2857 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9264 # fraction of time (cycle's) IFQ was full
RUU_count                 199232439 # cumulative RUU occupancy
RUU_fcount                 12413753 # cumulative RUU full count
ruu_occupancy               15.8613 # avg RUU occupancy (insn's)
ruu_rate                     1.7077 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2884 # avg RUU occupant latency (cycle's)
ruu_full                     0.9883 # fraction of time (cycle's) RUU was full
LSQ_count                  63902956 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0874 # avg LSQ occupancy (insn's)
lsq_rate                     1.7077 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9792 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290919372 # total number of slip cycles
avg_sim_slip                13.6076 # the average slip between issue and retirement
bpred_bimod.lookups         1654336 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482585 # total number of accesses
il1.hits                   21482406 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944844 # total number of accesses
dl1.hits                    4940898 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       5119 # total number of hits
ul2.misses                       58 # total number of misses
ul2.replacements                 42 # total number of replacements
ul2.writebacks                   16 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0112 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0081 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0031 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482585 # total number of accesses
itlb.hits                  21482579 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884008 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 16 -mem:lat 60 3 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:36:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         60 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27860703 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203831 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481846 # total number of branches executed
sim_cycle                  38579291 # total simulation time in cycles
sim_IPC                      0.7221 # instructions per cycle
sim_CPI                      1.3848 # cycles per instruction
sim_exec_BW                  0.7222 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 154186022 # cumulative IFQ occupancy
IFQ_fcount                 38546257 # cumulative IFQ full count
ifq_occupancy                3.9966 # avg IFQ occupancy (insn's)
ifq_rate                     0.7222 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5342 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9991 # fraction of time (cycle's) IFQ was full
RUU_count                 616747438 # cumulative RUU occupancy
RUU_fcount                 38545700 # cumulative RUU full count
ruu_occupancy               15.9865 # avg RUU occupancy (insn's)
ruu_rate                     0.7222 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.1368 # avg RUU occupant latency (cycle's)
ruu_full                     0.9991 # fraction of time (cycle's) RUU was full
LSQ_count                 185469787 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8075 # avg LSQ occupancy (insn's)
lsq_rate                     0.7222 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6570 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  838678350 # total number of slip cycles
avg_sim_slip                30.1036 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860788 # total number of accesses
il1.hits                   27860577 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5809162 # total number of hits
dl1.misses                  2844236 # total number of misses
dl1.replacements            2843724 # total number of replacements
dl1.writebacks               954932 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3287 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3286 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3799379 # total number of accesses
ul2.hits                    3797063 # total number of hits
ul2.misses                     2316 # total number of misses
ul2.replacements               2300 # total number of replacements
ul2.writebacks                 1251 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0006 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0006 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0003 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860788 # total number of accesses
itlb.hits                  27860782 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017756 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:36:43 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13149204 # total number of instructions executed
sim_total_refs              4034192 # total number of loads and stores executed
sim_total_loads             3020633 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8479686 # total simulation time in cycles
sim_IPC                      1.5457 # instructions per cycle
sim_CPI                      0.6470 # cycles per instruction
sim_exec_BW                  1.5507 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31775736 # cumulative IFQ occupancy
IFQ_fcount                  7796257 # cumulative IFQ full count
ifq_occupancy                3.7473 # avg IFQ occupancy (insn's)
ifq_rate                     1.5507 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4166 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9194 # fraction of time (cycle's) IFQ was full
RUU_count                 131444583 # cumulative RUU occupancy
RUU_fcount                  7115065 # cumulative RUU full count
ruu_occupancy               15.5011 # avg RUU occupancy (insn's)
ruu_rate                     1.5507 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9964 # avg RUU occupant latency (cycle's)
ruu_full                     0.8391 # fraction of time (cycle's) RUU was full
LSQ_count                  40059939 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7242 # avg LSQ occupancy (insn's)
lsq_rate                     1.5507 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0466 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  188545945 # total number of slip cycles
avg_sim_slip                14.3850 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189454 # total number of accesses
il1.hits                   13189276 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    3667132 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     344575 # total number of hits
ul2.misses                     2277 # total number of misses
ul2.replacements               1253 # total number of replacements
ul2.writebacks                  512 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0066 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0036 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0015 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189454 # total number of accesses
itlb.hits                  13189448 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357314 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:36:52 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12265340 # total number of instructions executed
sim_total_refs              4824018 # total number of loads and stores executed
sim_total_loads             2865540 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196951 # total number of branches executed
sim_cycle                   6685661 # total simulation time in cycles
sim_IPC                      1.7323 # instructions per cycle
sim_CPI                      0.5773 # cycles per instruction
sim_exec_BW                  1.8346 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  20049454 # cumulative IFQ occupancy
IFQ_fcount                  4184214 # cumulative IFQ full count
ifq_occupancy                2.9989 # avg IFQ occupancy (insn's)
ifq_rate                     1.8346 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.6346 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6258 # fraction of time (cycle's) IFQ was full
RUU_count                  82445495 # cumulative RUU occupancy
RUU_fcount                  3582191 # cumulative RUU full count
ruu_occupancy               12.3317 # avg RUU occupancy (insn's)
ruu_rate                     1.8346 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.7218 # avg RUU occupant latency (cycle's)
ruu_full                     0.5358 # fraction of time (cycle's) RUU was full
LSQ_count                  34859369 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.2140 # avg LSQ occupancy (insn's)
lsq_rate                     1.8346 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.8421 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  131339652 # total number of slip cycles
avg_sim_slip                11.3405 # the average slip between issue and retirement
bpred_bimod.lookups         3257679 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441951 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435454 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820264 # total number of accesses
il1.hits                   12820047 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497834 # total number of accesses
dl1.hits                    4480758 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      15488 # total number of hits
ul2.misses                    12164 # total number of misses
ul2.replacements              11140 # total number of replacements
ul2.writebacks                 7571 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4399 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.4029 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.2738 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820264 # total number of accesses
itlb.hits                  12820257 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918218 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:37:00 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375302 # total number of instructions executed
sim_total_refs              6748385 # total number of loads and stores executed
sim_total_loads             3824342 # total number of loads executed
sim_total_stores       2924043.0000 # total number of stores executed
sim_total_branches           390626 # total number of branches executed
sim_cycle                  12275551 # total simulation time in cycles
sim_IPC                      1.0852 # instructions per cycle
sim_CPI                      0.9215 # cycles per instruction
sim_exec_BW                  1.0896 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  48117786 # cumulative IFQ occupancy
IFQ_fcount                 11879734 # cumulative IFQ full count
ifq_occupancy                3.9198 # avg IFQ occupancy (insn's)
ifq_rate                     1.0896 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  3.5975 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9678 # fraction of time (cycle's) IFQ was full
RUU_count                 193390829 # cumulative RUU occupancy
RUU_fcount                 11730003 # cumulative RUU full count
ruu_occupancy               15.7541 # avg RUU occupancy (insn's)
ruu_rate                     1.0896 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 14.4588 # avg RUU occupant latency (cycle's)
ruu_full                     0.9556 # fraction of time (cycle's) RUU was full
LSQ_count                 102472779 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.3477 # avg LSQ occupancy (insn's)
lsq_rate                     1.0896 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.6613 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  315579178 # total number of slip cycles
avg_sim_slip                23.6905 # the average slip between issue and retirement
bpred_bimod.lookups          390935 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89876 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400274 # total number of accesses
il1.hits                   13399528 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171929 # total number of accesses
dl1.hits                    5866428 # total number of hits
dl1.misses                   305501 # total number of misses
dl1.replacements             304989 # total number of replacements
dl1.writebacks               156829 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0495 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0494 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 463076 # total number of accesses
ul2.hits                     366978 # total number of hits
ul2.misses                    96098 # total number of misses
ul2.replacements              95074 # total number of replacements
ul2.writebacks                73071 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2075 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2053 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1578 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400274 # total number of accesses
itlb.hits                  13400255 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767100 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:37:12 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449942 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938901 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12591180 # total simulation time in cycles
sim_IPC                      1.6980 # instructions per cycle
sim_CPI                      0.5889 # cycles per instruction
sim_exec_BW                  1.7036 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49213685 # cumulative IFQ occupancy
IFQ_fcount                 11684248 # cumulative IFQ full count
ifq_occupancy                3.9086 # avg IFQ occupancy (insn's)
ifq_rate                     1.7036 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2944 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9280 # fraction of time (cycle's) IFQ was full
RUU_count                 200000396 # cumulative RUU occupancy
RUU_fcount                 12464057 # cumulative RUU full count
ruu_occupancy               15.8842 # avg RUU occupancy (insn's)
ruu_rate                     1.7036 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3241 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64126107 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0929 # avg LSQ occupancy (insn's)
lsq_rate                     1.7036 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9896 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291909772 # total number of slip cycles
avg_sim_slip                13.6539 # the average slip between issue and retirement
bpred_bimod.lookups         1654334 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482598 # total number of accesses
il1.hits                   21482419 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943840 # total number of accesses
dl1.hits                    4939894 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       2606 # total number of hits
ul2.misses                     2571 # total number of misses
ul2.replacements               1547 # total number of replacements
ul2.writebacks                  659 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.4966 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2988 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1273 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482598 # total number of accesses
itlb.hits                  21482592 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884054 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:256:64:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:37:25 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:256:64:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861259 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37699284 # total simulation time in cycles
sim_IPC                      0.7390 # instructions per cycle
sim_CPI                      1.3532 # cycles per instruction
sim_exec_BW                  0.7390 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150720305 # cumulative IFQ occupancy
IFQ_fcount                 37679836 # cumulative IFQ full count
ifq_occupancy                3.9980 # avg IFQ occupancy (insn's)
ifq_rate                     0.7390 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4097 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 602884164 # cumulative RUU occupancy
RUU_fcount                 37679094 # cumulative RUU full count
ruu_occupancy               15.9919 # avg RUU occupancy (insn's)
ruu_rate                     0.7390 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.6388 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 183222932 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8601 # avg LSQ occupancy (insn's)
lsq_rate                     0.7390 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5763 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  822615724 # total number of slip cycles
avg_sim_slip                29.5271 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6071212 # total number of hits
dl1.misses                  2582186 # total number of misses
dl1.replacements            2581674 # total number of replacements
dl1.writebacks               958427 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2984 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2983 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1108 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3540824 # total number of accesses
ul2.hits                    3472444 # total number of hits
ul2.misses                    68380 # total number of misses
ul2.replacements              67356 # total number of replacements
ul2.writebacks                22676 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0193 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0190 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0064 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017756 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:37:49 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148931 # total number of instructions executed
sim_total_refs              4034196 # total number of loads and stores executed
sim_total_loads             3020632 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8495852 # total simulation time in cycles
sim_IPC                      1.5428 # instructions per cycle
sim_CPI                      0.6482 # cycles per instruction
sim_exec_BW                  1.5477 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31824551 # cumulative IFQ occupancy
IFQ_fcount                  7808452 # cumulative IFQ full count
ifq_occupancy                3.7459 # avg IFQ occupancy (insn's)
ifq_rate                     1.5477 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4203 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9191 # fraction of time (cycle's) IFQ was full
RUU_count                 131643339 # cumulative RUU occupancy
RUU_fcount                  7126886 # cumulative RUU full count
ruu_occupancy               15.4950 # avg RUU occupancy (insn's)
ruu_rate                     1.5477 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.0117 # avg RUU occupant latency (cycle's)
ruu_full                     0.8389 # fraction of time (cycle's) RUU was full
LSQ_count                  40124866 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7229 # avg LSQ occupancy (insn's)
lsq_rate                     1.5477 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0516 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  188807126 # total number of slip cycles
avg_sim_slip                14.4049 # the average slip between issue and retirement
bpred_bimod.lookups         1010976 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000660 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10190 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189437 # total number of accesses
il1.hits                   13189259 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013392 # total number of accesses
dl1.hits                    3667350 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     345692 # total number of hits
ul2.misses                     1160 # total number of misses
ul2.replacements                648 # total number of replacements
ul2.writebacks                  257 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0033 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0019 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0007 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189437 # total number of accesses
itlb.hits                  13189431 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013776 # total number of accesses
dtlb.hits                   4013740 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357244 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:37:57 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12265369 # total number of instructions executed
sim_total_refs              4824028 # total number of loads and stores executed
sim_total_loads             2865548 # total number of loads executed
sim_total_stores       1958480.0000 # total number of stores executed
sim_total_branches          3197013 # total number of branches executed
sim_cycle                   6734539 # total simulation time in cycles
sim_IPC                      1.7197 # instructions per cycle
sim_CPI                      0.5815 # cycles per instruction
sim_exec_BW                  1.8213 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  20226815 # cumulative IFQ occupancy
IFQ_fcount                  4228461 # cumulative IFQ full count
ifq_occupancy                3.0034 # avg IFQ occupancy (insn's)
ifq_rate                     1.8213 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.6491 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6279 # fraction of time (cycle's) IFQ was full
RUU_count                  83160304 # cumulative RUU occupancy
RUU_fcount                  3625412 # cumulative RUU full count
ruu_occupancy               12.3483 # avg RUU occupancy (insn's)
ruu_rate                     1.8213 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.7801 # avg RUU occupant latency (cycle's)
ruu_full                     0.5383 # fraction of time (cycle's) RUU was full
LSQ_count                  35166787 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.2219 # avg LSQ occupancy (insn's)
lsq_rate                     1.8213 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.8672 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  132348476 # total number of slip cycles
avg_sim_slip                11.4276 # the average slip between issue and retirement
bpred_bimod.lookups         3257741 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442015 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435454 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820388 # total number of accesses
il1.hits                   12820171 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497839 # total number of accesses
dl1.hits                    4480763 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      21515 # total number of hits
ul2.misses                     6137 # total number of misses
ul2.replacements               5625 # total number of replacements
ul2.writebacks                 3800 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2219 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.2034 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1374 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820388 # total number of accesses
itlb.hits                  12820381 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918730 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:38:05 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1024685.2308 # simulation speed (in insts/sec)
sim_total_insn             13376610 # total number of instructions executed
sim_total_refs              6748388 # total number of loads and stores executed
sim_total_loads             3824347 # total number of loads executed
sim_total_stores       2924041.0000 # total number of stores executed
sim_total_branches           390629 # total number of branches executed
sim_cycle                  14865268 # total simulation time in cycles
sim_IPC                      0.8961 # instructions per cycle
sim_CPI                      1.1159 # cycles per instruction
sim_exec_BW                  0.8999 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  58433919 # cumulative IFQ occupancy
IFQ_fcount                 14458698 # cumulative IFQ full count
ifq_occupancy                3.9309 # avg IFQ occupancy (insn's)
ifq_rate                     0.8999 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  4.3684 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9726 # fraction of time (cycle's) IFQ was full
RUU_count                 234700686 # cumulative RUU occupancy
RUU_fcount                 14309244 # cumulative RUU full count
ruu_occupancy               15.7885 # avg RUU occupancy (insn's)
ruu_rate                     0.8999 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 17.5456 # avg RUU occupant latency (cycle's)
ruu_full                     0.9626 # fraction of time (cycle's) RUU was full
LSQ_count                 125003190 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.4091 # avg LSQ occupancy (insn's)
lsq_rate                     0.8999 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  9.3449 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  379416817 # total number of slip cycles
avg_sim_slip                28.4828 # the average slip between issue and retirement
bpred_bimod.lookups          390938 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400277 # total number of accesses
il1.hits                   13399531 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171900 # total number of accesses
dl1.hits                    5866728 # total number of hits
dl1.misses                   305172 # total number of misses
dl1.replacements             304660 # total number of replacements
dl1.writebacks               156825 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0494 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462743 # total number of accesses
ul2.hits                     386565 # total number of hits
ul2.misses                    76178 # total number of misses
ul2.replacements              75666 # total number of replacements
ul2.writebacks                62226 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1646 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1635 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1345 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400277 # total number of accesses
itlb.hits                  13400258 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767128 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:38:18 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450288 # total number of instructions executed
sim_total_refs              6588955 # total number of loads and stores executed
sim_total_loads             4938902 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12611200 # total simulation time in cycles
sim_IPC                      1.6953 # instructions per cycle
sim_CPI                      0.5899 # cycles per instruction
sim_exec_BW                  1.7009 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49279727 # cumulative IFQ occupancy
IFQ_fcount                 11700378 # cumulative IFQ full count
ifq_occupancy                3.9076 # avg IFQ occupancy (insn's)
ifq_rate                     1.7009 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2974 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9278 # fraction of time (cycle's) IFQ was full
RUU_count                 200252232 # cumulative RUU occupancy
RUU_fcount                 12478558 # cumulative RUU full count
ruu_occupancy               15.8789 # avg RUU occupancy (insn's)
ruu_rate                     1.7009 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3356 # avg RUU occupant latency (cycle's)
ruu_full                     0.9895 # fraction of time (cycle's) RUU was full
LSQ_count                  64209403 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0915 # avg LSQ occupancy (insn's)
lsq_rate                     1.7009 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9934 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292244113 # total number of slip cycles
avg_sim_slip                13.6695 # the average slip between issue and retirement
bpred_bimod.lookups         1654335 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           57 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482587 # total number of accesses
il1.hits                   21482408 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944348 # total number of accesses
dl1.hits                    4940402 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       3867 # total number of hits
ul2.misses                     1310 # total number of misses
ul2.replacements                798 # total number of replacements
ul2.writebacks                  344 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2530 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1541 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0664 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482587 # total number of accesses
itlb.hits                  21482581 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884014 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:128:128:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:38:32 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:128:128:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27861308 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203829 # total number of loads executed
sim_total_stores       1449774.0000 # total number of stores executed
sim_total_branches           481844 # total number of branches executed
sim_cycle                  38083745 # total simulation time in cycles
sim_IPC                      0.7315 # instructions per cycle
sim_CPI                      1.3670 # cycles per instruction
sim_exec_BW                  0.7316 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 152242251 # cumulative IFQ occupancy
IFQ_fcount                 38060324 # cumulative IFQ full count
ifq_occupancy                3.9976 # avg IFQ occupancy (insn's)
ifq_rate                     0.7316 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.4643 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 608974234 # cumulative RUU occupancy
RUU_fcount                 38059567 # cumulative RUU full count
ruu_occupancy               15.9904 # avg RUU occupancy (insn's)
ruu_rate                     0.7316 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.8573 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 185088505 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8600 # avg LSQ occupancy (insn's)
lsq_rate                     0.7316 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6432 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  830568250 # total number of slip cycles
avg_sim_slip                29.8125 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860788 # total number of accesses
il1.hits                   27860577 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6076269 # total number of hits
dl1.misses                  2577129 # total number of misses
dl1.replacements            2576617 # total number of replacements
dl1.writebacks               958457 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2978 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2978 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1108 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3535797 # total number of accesses
ul2.hits                    3500931 # total number of hits
ul2.misses                    34866 # total number of misses
ul2.replacements              34354 # total number of replacements
ul2.writebacks                11770 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0099 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0097 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0033 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860788 # total number of accesses
itlb.hits                  27860782 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017752 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:38:55 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148952 # total number of instructions executed
sim_total_refs              4034200 # total number of loads and stores executed
sim_total_loads             3020636 # total number of loads executed
sim_total_stores       1013564.0000 # total number of stores executed
sim_total_branches          1010957 # total number of branches executed
sim_cycle                   8498949 # total simulation time in cycles
sim_IPC                      1.5422 # instructions per cycle
sim_CPI                      0.6484 # cycles per instruction
sim_exec_BW                  1.5471 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31819817 # cumulative IFQ occupancy
IFQ_fcount                  7807255 # cumulative IFQ full count
ifq_occupancy                3.7440 # avg IFQ occupancy (insn's)
ifq_rate                     1.5471 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4200 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9186 # fraction of time (cycle's) IFQ was full
RUU_count                 131637125 # cumulative RUU occupancy
RUU_fcount                  7125474 # cumulative RUU full count
ruu_occupancy               15.4886 # avg RUU occupancy (insn's)
ruu_rate                     1.5471 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.0112 # avg RUU occupant latency (cycle's)
ruu_full                     0.8384 # fraction of time (cycle's) RUU was full
LSQ_count                  40125681 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7213 # avg LSQ occupancy (insn's)
lsq_rate                     1.5471 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0516 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  188793196 # total number of slip cycles
avg_sim_slip                14.4039 # the average slip between issue and retirement
bpred_bimod.lookups         1010980 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189458 # total number of accesses
il1.hits                   13189280 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013522 # total number of accesses
dl1.hits                    3667480 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346258 # total number of hits
ul2.misses                      594 # total number of misses
ul2.replacements                338 # total number of replacements
ul2.writebacks                  129 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0017 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0010 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0004 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189458 # total number of accesses
itlb.hits                  13189452 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013780 # total number of accesses
dtlb.hits                   4013744 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357336 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:39:04 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12267285 # total number of instructions executed
sim_total_refs              4824015 # total number of loads and stores executed
sim_total_loads             2865542 # total number of loads executed
sim_total_stores       1958473.0000 # total number of stores executed
sim_total_branches          3197044 # total number of branches executed
sim_cycle                   6768606 # total simulation time in cycles
sim_IPC                      1.7111 # instructions per cycle
sim_CPI                      0.5844 # cycles per instruction
sim_exec_BW                  1.8124 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  20333492 # cumulative IFQ occupancy
IFQ_fcount                  4255084 # cumulative IFQ full count
ifq_occupancy                3.0041 # avg IFQ occupancy (insn's)
ifq_rate                     1.8124 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.6575 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6286 # fraction of time (cycle's) IFQ was full
RUU_count                  83591095 # cumulative RUU occupancy
RUU_fcount                  3651044 # cumulative RUU full count
ruu_occupancy               12.3498 # avg RUU occupancy (insn's)
ruu_rate                     1.8124 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.8141 # avg RUU occupant latency (cycle's)
ruu_full                     0.5394 # fraction of time (cycle's) RUU was full
LSQ_count                  35363290 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.2246 # avg LSQ occupancy (insn's)
lsq_rate                     1.8124 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.8827 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  132951153 # total number of slip cycles
avg_sim_slip                11.4796 # the average slip between issue and retirement
bpred_bimod.lookups         3257770 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984764 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442047 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435453 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820428 # total number of accesses
il1.hits                   12820211 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497839 # total number of accesses
dl1.hits                    4480763 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      24532 # total number of hits
ul2.misses                     3120 # total number of misses
ul2.replacements               2864 # total number of replacements
ul2.writebacks                 1917 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1128 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1036 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0693 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820428 # total number of accesses
itlb.hits                  12820421 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85918880 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:39:11 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 16 # total simulation time in seconds
sim_inst_rate           832556.7500 # simulation speed (in insts/sec)
sim_total_insn             13378461 # total number of instructions executed
sim_total_refs              6748395 # total number of loads and stores executed
sim_total_loads             3824348 # total number of loads executed
sim_total_stores       2924047.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  19904404 # total simulation time in cycles
sim_IPC                      0.6692 # instructions per cycle
sim_CPI                      1.4942 # cycles per instruction
sim_exec_BW                  0.6721 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  78547376 # cumulative IFQ occupancy
IFQ_fcount                 19486709 # cumulative IFQ full count
ifq_occupancy                3.9462 # avg IFQ occupancy (insn's)
ifq_rate                     0.6721 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.8712 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9790 # fraction of time (cycle's) IFQ was full
RUU_count                 315164090 # cumulative RUU occupancy
RUU_fcount                 19337265 # cumulative RUU full count
ruu_occupancy               15.8339 # avg RUU occupancy (insn's)
ruu_rate                     0.6721 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 23.5576 # avg RUU occupant latency (cycle's)
ruu_full                     0.9715 # fraction of time (cycle's) RUU was full
LSQ_count                 169039842 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.4926 # avg LSQ occupancy (insn's)
lsq_rate                     0.6721 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 12.6352 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  503906742 # total number of slip cycles
avg_sim_slip                37.8283 # the average slip between issue and retirement
bpred_bimod.lookups          390940 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90459 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400289 # total number of accesses
il1.hits                   13399543 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171900 # total number of accesses
dl1.hits                    5866897 # total number of hits
dl1.misses                   305003 # total number of misses
dl1.replacements             304491 # total number of replacements
dl1.writebacks               156826 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462575 # total number of accesses
ul2.hits                     394980 # total number of hits
ul2.misses                    67595 # total number of misses
ul2.replacements              67339 # total number of replacements
ul2.writebacks                57336 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1461 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1456 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1239 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400289 # total number of accesses
itlb.hits                  13400270 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740999 # total number of accesses
dtlb.hits                   6736825 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767178 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:39:27 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449685 # total number of instructions executed
sim_total_refs              6588961 # total number of loads and stores executed
sim_total_loads             4938904 # total number of loads executed
sim_total_stores       1650057.0000 # total number of stores executed
sim_total_branches          1647450 # total number of branches executed
sim_cycle                  12622661 # total simulation time in cycles
sim_IPC                      1.6937 # instructions per cycle
sim_CPI                      0.5904 # cycles per instruction
sim_exec_BW                  1.6993 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49307530 # cumulative IFQ occupancy
IFQ_fcount                 11707133 # cumulative IFQ full count
ifq_occupancy                3.9063 # avg IFQ occupancy (insn's)
ifq_rate                     1.6993 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2988 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9275 # fraction of time (cycle's) IFQ was full
RUU_count                 200365173 # cumulative RUU occupancy
RUU_fcount                 12484701 # cumulative RUU full count
ruu_occupancy               15.8734 # avg RUU occupancy (insn's)
ruu_rate                     1.6993 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3412 # avg RUU occupant latency (cycle's)
ruu_full                     0.9891 # fraction of time (cycle's) RUU was full
LSQ_count                  64246217 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0898 # avg LSQ occupancy (insn's)
lsq_rate                     1.6993 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9952 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292386961 # total number of slip cycles
avg_sim_slip                13.6762 # the average slip between issue and retirement
bpred_bimod.lookups         1654339 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           80 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482601 # total number of accesses
il1.hits                   21482422 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944606 # total number of accesses
dl1.hits                    4940660 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       4505 # total number of hits
ul2.misses                      672 # total number of misses
ul2.replacements                416 # total number of replacements
ul2.writebacks                  183 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1298 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0804 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0353 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482601 # total number of accesses
itlb.hits                  21482595 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565560 # total number of accesses
dtlb.hits                   6565521 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884074 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:64:256:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:39:40 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:64:256:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27863179 # total number of instructions executed
sim_total_refs              8653612 # total number of loads and stores executed
sim_total_loads             7203835 # total number of loads executed
sim_total_stores       1449777.0000 # total number of stores executed
sim_total_branches           481852 # total number of branches executed
sim_cycle                  38576876 # total simulation time in cycles
sim_IPC                      0.7222 # instructions per cycle
sim_CPI                      1.3847 # cycles per instruction
sim_exec_BW                  0.7223 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 154190868 # cumulative IFQ occupancy
IFQ_fcount                 38547247 # cumulative IFQ full count
ifq_occupancy                3.9970 # avg IFQ occupancy (insn's)
ifq_rate                     0.7223 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5339 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9992 # fraction of time (cycle's) IFQ was full
RUU_count                 616769197 # cumulative RUU occupancy
RUU_fcount                 38546335 # cumulative RUU full count
ruu_occupancy               15.9881 # avg RUU occupancy (insn's)
ruu_rate                     0.7223 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.1356 # avg RUU occupant latency (cycle's)
ruu_full                     0.9992 # fraction of time (cycle's) RUU was full
LSQ_count                 186087913 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8238 # avg LSQ occupancy (insn's)
lsq_rate                     0.7223 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.6786 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  839354457 # total number of slip cycles
avg_sim_slip                30.1279 # the average slip between issue and retirement
bpred_bimod.lookups          481892 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          123 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860810 # total number of accesses
il1.hits                   27860599 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653401 # total number of accesses
dl1.hits                    5938859 # total number of hits
dl1.misses                  2714542 # total number of misses
dl1.replacements            2714030 # total number of replacements
dl1.writebacks               956651 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3137 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3136 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3671404 # total number of accesses
ul2.hits                    3653299 # total number of hits
ul2.misses                    18105 # total number of misses
ul2.replacements              17849 # total number of replacements
ul2.writebacks                 6372 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0049 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0049 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0017 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860810 # total number of accesses
itlb.hits                  27860804 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653414 # total number of accesses
dtlb.hits                   8652341 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017852 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:40:04 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8504539 # total simulation time in cycles
sim_IPC                      1.5412 # instructions per cycle
sim_CPI                      0.6488 # cycles per instruction
sim_exec_BW                  1.5461 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31812854 # cumulative IFQ occupancy
IFQ_fcount                  7805517 # cumulative IFQ full count
ifq_occupancy                3.7407 # avg IFQ occupancy (insn's)
ifq_rate                     1.5461 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4194 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9178 # fraction of time (cycle's) IFQ was full
RUU_count                 131611073 # cumulative RUU occupancy
RUU_fcount                  7123645 # cumulative RUU full count
ruu_occupancy               15.4754 # avg RUU occupancy (insn's)
ruu_rate                     1.5461 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.0093 # avg RUU occupant latency (cycle's)
ruu_full                     0.8376 # fraction of time (cycle's) RUU was full
LSQ_count                  40115396 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7169 # avg LSQ occupancy (insn's)
lsq_rate                     1.5461 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0508 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  188769358 # total number of slip cycles
avg_sim_slip                14.4020 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189425 # total number of accesses
il1.hits                   13189247 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013580 # total number of accesses
dl1.hits                    3667538 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346541 # total number of hits
ul2.misses                      311 # total number of misses
ul2.replacements                183 # total number of replacements
ul2.writebacks                   66 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0009 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0005 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189425 # total number of accesses
itlb.hits                  13189419 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357194 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:40:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12270003 # total number of instructions executed
sim_total_refs              4824018 # total number of loads and stores executed
sim_total_loads             2865547 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197057 # total number of branches executed
sim_cycle                   6806279 # total simulation time in cycles
sim_IPC                      1.7016 # instructions per cycle
sim_CPI                      0.5877 # cycles per instruction
sim_exec_BW                  1.8027 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  20432453 # cumulative IFQ occupancy
IFQ_fcount                  4279800 # cumulative IFQ full count
ifq_occupancy                3.0020 # avg IFQ occupancy (insn's)
ifq_rate                     1.8027 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.6652 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6288 # fraction of time (cycle's) IFQ was full
RUU_count                  83990468 # cumulative RUU occupancy
RUU_fcount                  3674836 # cumulative RUU full count
ruu_occupancy               12.3401 # avg RUU occupancy (insn's)
ruu_rate                     1.8027 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.8452 # avg RUU occupant latency (cycle's)
ruu_full                     0.5399 # fraction of time (cycle's) RUU was full
LSQ_count                  35575971 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.2269 # avg LSQ occupancy (insn's)
lsq_rate                     1.8027 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.8994 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  133519513 # total number of slip cycles
avg_sim_slip                11.5287 # the average slip between issue and retirement
bpred_bimod.lookups         3257785 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442061 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820461 # total number of accesses
il1.hits                   12820244 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497830 # total number of accesses
dl1.hits                    4480754 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      26039 # total number of hits
ul2.misses                     1613 # total number of misses
ul2.replacements               1485 # total number of replacements
ul2.writebacks                  978 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0583 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0537 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0354 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820461 # total number of accesses
itlb.hits                  12820454 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514858 # total number of accesses
dtlb.hits                   4514792 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919022 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:40:21 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 19 # total simulation time in seconds
sim_inst_rate           701100.4211 # simulation speed (in insts/sec)
sim_total_insn             13382482 # total number of instructions executed
sim_total_refs              6748388 # total number of loads and stores executed
sim_total_loads             3824348 # total number of loads executed
sim_total_stores       2924040.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  30354386 # total simulation time in cycles
sim_IPC                      0.4388 # instructions per cycle
sim_CPI                      2.2787 # cycles per instruction
sim_exec_BW                  0.4409 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                 120265255 # cumulative IFQ occupancy
IFQ_fcount                 29915748 # cumulative IFQ full count
ifq_occupancy                3.9620 # avg IFQ occupancy (insn's)
ifq_rate                     0.4409 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  8.9868 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9855 # fraction of time (cycle's) IFQ was full
RUU_count                 482057933 # cumulative RUU occupancy
RUU_fcount                 29765710 # cumulative RUU full count
ruu_occupancy               15.8810 # avg RUU occupancy (insn's)
ruu_rate                     0.4409 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 36.0216 # avg RUU occupant latency (cycle's)
ruu_full                     0.9806 # fraction of time (cycle's) RUU was full
LSQ_count                 260484912 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.5815 # avg LSQ occupancy (insn's)
lsq_rate                     0.4409 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 19.4646 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  762248901 # total number of slip cycles
avg_sim_slip                57.2220 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400270 # total number of accesses
il1.hits                   13399525 # total number of hits
il1.misses                      745 # total number of misses
il1.replacements                249 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171935 # total number of accesses
dl1.hits                    5867003 # total number of hits
dl1.misses                   304932 # total number of misses
dl1.replacements             304420 # total number of replacements
dl1.writebacks               156826 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462503 # total number of accesses
ul2.hits                     396143 # total number of hits
ul2.misses                    66360 # total number of misses
ul2.replacements              66232 # total number of replacements
ul2.writebacks                55963 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1435 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1432 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1210 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400270 # total number of accesses
itlb.hits                  13400251 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767102 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:40:40 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21449676 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938903 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12634182 # total simulation time in cycles
sim_IPC                      1.6922 # instructions per cycle
sim_CPI                      0.5910 # cycles per instruction
sim_exec_BW                  1.6977 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49324468 # cumulative IFQ occupancy
IFQ_fcount                 11711273 # cumulative IFQ full count
ifq_occupancy                3.9040 # avg IFQ occupancy (insn's)
ifq_rate                     1.6977 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2995 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9270 # fraction of time (cycle's) IFQ was full
RUU_count                 200434623 # cumulative RUU occupancy
RUU_fcount                 12488459 # cumulative RUU full count
ruu_occupancy               15.8645 # avg RUU occupancy (insn's)
ruu_rate                     1.6977 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3444 # avg RUU occupant latency (cycle's)
ruu_full                     0.9885 # fraction of time (cycle's) RUU was full
LSQ_count                  64266735 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0867 # avg LSQ occupancy (insn's)
lsq_rate                     1.6977 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9962 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292485335 # total number of slip cycles
avg_sim_slip                13.6808 # the average slip between issue and retirement
bpred_bimod.lookups         1654336 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482586 # total number of accesses
il1.hits                   21482407 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944732 # total number of accesses
dl1.hits                    4940786 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       4824 # total number of hits
ul2.misses                      353 # total number of misses
ul2.replacements                225 # total number of replacements
ul2.writebacks                  100 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0682 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0435 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0193 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482586 # total number of accesses
itlb.hits                  21482580 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884012 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:32:512:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:40:54 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:32:512:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27865819 # total number of instructions executed
sim_total_refs              8653604 # total number of loads and stores executed
sim_total_loads             7203832 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481847 # total number of branches executed
sim_cycle                  38994647 # total simulation time in cycles
sim_IPC                      0.7144 # instructions per cycle
sim_CPI                      1.3997 # cycles per instruction
sim_exec_BW                  0.7146 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 155825352 # cumulative IFQ occupancy
IFQ_fcount                 38956097 # cumulative IFQ full count
ifq_occupancy                3.9961 # avg IFQ occupancy (insn's)
ifq_rate                     0.7146 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5920 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9990 # fraction of time (cycle's) IFQ was full
RUU_count                 623302967 # cumulative RUU occupancy
RUU_fcount                 38954230 # cumulative RUU full count
ruu_occupancy               15.9843 # avg RUU occupancy (insn's)
ruu_rate                     0.7146 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.3680 # avg RUU occupant latency (cycle's)
ruu_full                     0.9990 # fraction of time (cycle's) RUU was full
LSQ_count                 187448362 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8070 # avg LSQ occupancy (insn's)
lsq_rate                     0.7146 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.7268 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  847259578 # total number of slip cycles
avg_sim_slip                30.4117 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860786 # total number of accesses
il1.hits                   27860575 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5867867 # total number of hits
dl1.misses                  2785531 # total number of misses
dl1.replacements            2785019 # total number of replacements
dl1.writebacks               955704 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3219 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3218 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3741446 # total number of accesses
ul2.hits                    3731754 # total number of hits
ul2.misses                     9692 # total number of misses
ul2.replacements               9564 # total number of replacements
ul2.writebacks                 3646 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0026 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0026 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0010 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860786 # total number of accesses
itlb.hits                  27860780 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017750 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:41:18 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8535349 # total simulation time in cycles
sim_IPC                      1.5356 # instructions per cycle
sim_CPI                      0.6512 # cycles per instruction
sim_exec_BW                  1.5405 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31903447 # cumulative IFQ occupancy
IFQ_fcount                  7828173 # cumulative IFQ full count
ifq_occupancy                3.7378 # avg IFQ occupancy (insn's)
ifq_rate                     1.5405 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4263 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9171 # fraction of time (cycle's) IFQ was full
RUU_count                 131988625 # cumulative RUU occupancy
RUU_fcount                  7146226 # cumulative RUU full count
ruu_occupancy               15.4638 # avg RUU occupancy (insn's)
ruu_rate                     1.5405 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.0380 # avg RUU occupant latency (cycle's)
ruu_full                     0.8373 # fraction of time (cycle's) RUU was full
LSQ_count                  40234869 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7139 # avg LSQ occupancy (insn's)
lsq_rate                     1.5405 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0599 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  189266383 # total number of slip cycles
avg_sim_slip                14.4400 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189425 # total number of accesses
il1.hits                   13189247 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013611 # total number of accesses
dl1.hits                    3667569 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346687 # total number of hits
ul2.misses                      165 # total number of misses
ul2.replacements                101 # total number of replacements
ul2.writebacks                   34 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0005 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0003 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189425 # total number of accesses
itlb.hits                  13189419 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357194 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:41:27 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12270181 # total number of instructions executed
sim_total_refs              4824020 # total number of loads and stores executed
sim_total_loads             2865549 # total number of loads executed
sim_total_stores       1958471.0000 # total number of stores executed
sim_total_branches          3197065 # total number of branches executed
sim_cycle                   6857294 # total simulation time in cycles
sim_IPC                      1.6889 # instructions per cycle
sim_CPI                      0.5921 # cycles per instruction
sim_exec_BW                  1.7894 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  20584999 # cumulative IFQ occupancy
IFQ_fcount                  4317927 # cumulative IFQ full count
ifq_occupancy                3.0019 # avg IFQ occupancy (insn's)
ifq_rate                     1.7894 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.6776 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6297 # fraction of time (cycle's) IFQ was full
RUU_count                  84603279 # cumulative RUU occupancy
RUU_fcount                  3712788 # cumulative RUU full count
ruu_occupancy               12.3377 # avg RUU occupancy (insn's)
ruu_rate                     1.7894 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.8950 # avg RUU occupant latency (cycle's)
ruu_full                     0.5414 # fraction of time (cycle's) RUU was full
LSQ_count                  35942080 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.2414 # avg LSQ occupancy (insn's)
lsq_rate                     1.7894 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9292 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  134407373 # total number of slip cycles
avg_sim_slip                11.6053 # the average slip between issue and retirement
bpred_bimod.lookups         3257793 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442067 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820477 # total number of accesses
il1.hits                   12820260 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497830 # total number of accesses
dl1.hits                    4480754 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      26806 # total number of hits
ul2.misses                      846 # total number of misses
ul2.replacements                782 # total number of replacements
ul2.writebacks                  506 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0306 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0283 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0183 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820477 # total number of accesses
itlb.hits                  12820470 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514858 # total number of accesses
dtlb.hits                   4514792 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919090 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:41:35 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 28 # total simulation time in seconds
sim_inst_rate           475746.7143 # simulation speed (in insts/sec)
sim_total_insn             13380032 # total number of instructions executed
sim_total_refs              6748391 # total number of loads and stores executed
sim_total_loads             3824351 # total number of loads executed
sim_total_stores       2924040.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  52035322 # total simulation time in cycles
sim_IPC                      0.2560 # instructions per cycle
sim_CPI                      3.9063 # cycles per instruction
sim_exec_BW                  0.2571 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                 206873066 # cumulative IFQ occupancy
IFQ_fcount                 51567499 # cumulative IFQ full count
ifq_occupancy                3.9756 # avg IFQ occupancy (insn's)
ifq_rate                     0.2571 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                 15.4613 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9910 # fraction of time (cycle's) IFQ was full
RUU_count                 828521762 # cumulative RUU occupancy
RUU_fcount                 51418138 # cumulative RUU full count
ruu_occupancy               15.9223 # avg RUU occupancy (insn's)
ruu_rate                     0.2571 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 61.9223 # avg RUU occupant latency (cycle's)
ruu_full                     0.9881 # fraction of time (cycle's) RUU was full
LSQ_count                 450756701 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.6625 # avg LSQ occupancy (insn's)
lsq_rate                     0.2571 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 33.6888 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                 1298962225 # total number of slip cycles
avg_sim_slip                97.5130 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400279 # total number of accesses
il1.hits                   13399534 # total number of hits
il1.misses                      745 # total number of misses
il1.replacements                249 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6171957 # total number of accesses
dl1.hits                    5867042 # total number of hits
dl1.misses                   304915 # total number of misses
dl1.replacements             304403 # total number of replacements
dl1.writebacks               156830 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462490 # total number of accesses
ul2.hits                     396653 # total number of hits
ul2.misses                    65837 # total number of misses
ul2.replacements              65773 # total number of replacements
ul2.writebacks                55776 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1424 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1422 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1206 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400279 # total number of accesses
itlb.hits                  13400260 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767144 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:42:03 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449664 # total number of instructions executed
sim_total_refs              6588951 # total number of loads and stores executed
sim_total_loads             4938902 # total number of loads executed
sim_total_stores       1650049.0000 # total number of stores executed
sim_total_branches          1647445 # total number of branches executed
sim_cycle                  12652544 # total simulation time in cycles
sim_IPC                      1.6897 # instructions per cycle
sim_CPI                      0.5918 # cycles per instruction
sim_exec_BW                  1.6953 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49365257 # cumulative IFQ occupancy
IFQ_fcount                 11721423 # cumulative IFQ full count
ifq_occupancy                3.9016 # avg IFQ occupancy (insn's)
ifq_rate                     1.6953 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3014 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9264 # fraction of time (cycle's) IFQ was full
RUU_count                 200600940 # cumulative RUU occupancy
RUU_fcount                 12498429 # cumulative RUU full count
ruu_occupancy               15.8546 # avg RUU occupancy (insn's)
ruu_rate                     1.6953 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3522 # avg RUU occupant latency (cycle's)
ruu_full                     0.9878 # fraction of time (cycle's) RUU was full
LSQ_count                  64334211 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0847 # avg LSQ occupancy (insn's)
lsq_rate                     1.6953 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9993 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292719197 # total number of slip cycles
avg_sim_slip                13.6917 # the average slip between issue and retirement
bpred_bimod.lookups         1654333 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           77 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482573 # total number of accesses
il1.hits                   21482394 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944795 # total number of accesses
dl1.hits                    4940849 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       4984 # total number of hits
ul2.misses                      193 # total number of misses
ul2.replacements                129 # total number of replacements
ul2.writebacks                   55 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0373 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0249 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0106 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482573 # total number of accesses
itlb.hits                  21482567 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565557 # total number of accesses
dtlb.hits                   6565518 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178883958 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:16:1024:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:42:16 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:16:1024:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27865965 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203831 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481846 # total number of branches executed
sim_cycle                  39551443 # total simulation time in cycles
sim_IPC                      0.7044 # instructions per cycle
sim_CPI                      1.4197 # cycles per instruction
sim_exec_BW                  0.7045 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 158014486 # cumulative IFQ occupancy
IFQ_fcount                 39503372 # cumulative IFQ full count
ifq_occupancy                3.9952 # avg IFQ occupancy (insn's)
ifq_rate                     0.7045 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.6705 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9988 # fraction of time (cycle's) IFQ was full
RUU_count                 632060667 # cumulative RUU occupancy
RUU_fcount                 39501476 # cumulative RUU full count
ruu_occupancy               15.9807 # avg RUU occupancy (insn's)
ruu_rate                     0.7045 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.6822 # avg RUU occupant latency (cycle's)
ruu_full                     0.9987 # fraction of time (cycle's) RUU was full
LSQ_count                 189913988 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8017 # avg LSQ occupancy (insn's)
lsq_rate                     0.7045 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.8153 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  858482241 # total number of slip cycles
avg_sim_slip                30.8145 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860784 # total number of accesses
il1.hits                   27860573 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5833046 # total number of hits
dl1.misses                  2820352 # total number of misses
dl1.replacements            2819840 # total number of replacements
dl1.writebacks               955247 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3259 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3259 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3775810 # total number of accesses
ul2.hits                    3770326 # total number of hits
ul2.misses                     5484 # total number of misses
ul2.replacements               5420 # total number of replacements
ul2.writebacks                 2278 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0015 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0014 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0006 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860784 # total number of accesses
itlb.hits                  27860778 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017740 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:42:40 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8547453 # total simulation time in cycles
sim_IPC                      1.5335 # instructions per cycle
sim_CPI                      0.6521 # cycles per instruction
sim_exec_BW                  1.5383 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31924664 # cumulative IFQ occupancy
IFQ_fcount                  7833474 # cumulative IFQ full count
ifq_occupancy                3.7350 # avg IFQ occupancy (insn's)
ifq_rate                     1.5383 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4279 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9165 # fraction of time (cycle's) IFQ was full
RUU_count                 132070676 # cumulative RUU occupancy
RUU_fcount                  7151506 # cumulative RUU full count
ruu_occupancy               15.4515 # avg RUU occupancy (insn's)
ruu_rate                     1.5383 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.0442 # avg RUU occupant latency (cycle's)
ruu_full                     0.8367 # fraction of time (cycle's) RUU was full
LSQ_count                  40262671 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7105 # avg LSQ occupancy (insn's)
lsq_rate                     1.5383 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0621 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  189376226 # total number of slip cycles
avg_sim_slip                14.4483 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189424 # total number of accesses
il1.hits                   13189246 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013627 # total number of accesses
dl1.hits                    3667585 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346764 # total number of hits
ul2.misses                       88 # total number of misses
ul2.replacements                 56 # total number of replacements
ul2.writebacks                   18 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0003 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0002 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189424 # total number of accesses
itlb.hits                  13189418 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357190 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:42:48 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12275526 # total number of instructions executed
sim_total_refs              4824022 # total number of loads and stores executed
sim_total_loads             2865547 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197074 # total number of branches executed
sim_cycle                   6930825 # total simulation time in cycles
sim_IPC                      1.6710 # instructions per cycle
sim_CPI                      0.5984 # cycles per instruction
sim_exec_BW                  1.7711 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  20814664 # cumulative IFQ occupancy
IFQ_fcount                  4375332 # cumulative IFQ full count
ifq_occupancy                3.0032 # avg IFQ occupancy (insn's)
ifq_rate                     1.7711 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.6956 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6313 # fraction of time (cycle's) IFQ was full
RUU_count                  85505295 # cumulative RUU occupancy
RUU_fcount                  3768788 # cumulative RUU full count
ruu_occupancy               12.3370 # avg RUU occupancy (insn's)
ruu_rate                     1.7711 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.9655 # avg RUU occupant latency (cycle's)
ruu_full                     0.5438 # fraction of time (cycle's) RUU was full
LSQ_count                  36430700 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.2563 # avg LSQ occupancy (insn's)
lsq_rate                     1.7711 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9678 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  135547211 # total number of slip cycles
avg_sim_slip                11.7038 # the average slip between issue and retirement
bpred_bimod.lookups         3257802 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442075 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820498 # total number of accesses
il1.hits                   12820281 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497831 # total number of accesses
dl1.hits                    4480755 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      27193 # total number of hits
ul2.misses                      459 # total number of misses
ul2.replacements                427 # total number of replacements
ul2.writebacks                  265 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0166 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0154 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0096 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820498 # total number of accesses
itlb.hits                  12820491 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919170 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:42:56 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 47 # total simulation time in seconds
sim_inst_rate           283423.5745 # simulation speed (in insts/sec)
sim_total_insn             13385350 # total number of instructions executed
sim_total_refs              6748394 # total number of loads and stores executed
sim_total_loads             3824351 # total number of loads executed
sim_total_stores       2924043.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                  99502364 # total simulation time in cycles
sim_IPC                      0.1339 # instructions per cycle
sim_CPI                      7.4696 # cycles per instruction
sim_exec_BW                  0.1345 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                 396502483 # cumulative IFQ occupancy
IFQ_fcount                 98975819 # cumulative IFQ full count
ifq_occupancy                3.9849 # avg IFQ occupancy (insn's)
ifq_rate                     0.1345 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                 29.6221 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9947 # fraction of time (cycle's) IFQ was full
RUU_count                1587196974 # cumulative RUU occupancy
RUU_fcount                 98823859 # cumulative RUU full count
ruu_occupancy               15.9513 # avg RUU occupancy (insn's)
ruu_rate                     0.1345 # avg RUU dispatch rate (insn/cycle)
ruu_latency                118.5772 # avg RUU occupant latency (cycle's)
ruu_full                     0.9932 # fraction of time (cycle's) RUU was full
LSQ_count                 868034186 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.7238 # avg LSQ occupancy (insn's)
lsq_rate                     0.1345 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                 64.8496 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                 2474868471 # total number of slip cycles
avg_sim_slip               185.7883 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400286 # total number of accesses
il1.hits                   13399540 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6172114 # total number of accesses
dl1.hits                    5867188 # total number of hits
dl1.misses                   304926 # total number of misses
dl1.replacements             304414 # total number of replacements
dl1.writebacks               156830 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462502 # total number of accesses
ul2.hits                     394305 # total number of hits
ul2.misses                    68197 # total number of misses
ul2.replacements              68165 # total number of replacements
ul2.writebacks                57465 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1475 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1474 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1242 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400286 # total number of accesses
itlb.hits                  13400267 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156767172 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:43:43 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449676 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938903 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12667339 # total simulation time in cycles
sim_IPC                      1.6877 # instructions per cycle
sim_CPI                      0.5925 # cycles per instruction
sim_exec_BW                  1.6933 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49397312 # cumulative IFQ occupancy
IFQ_fcount                 11729412 # cumulative IFQ full count
ifq_occupancy                3.8996 # avg IFQ occupancy (insn's)
ifq_rate                     1.6933 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3029 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9260 # fraction of time (cycle's) IFQ was full
RUU_count                 200713125 # cumulative RUU occupancy
RUU_fcount                 12506323 # cumulative RUU full count
ruu_occupancy               15.8449 # avg RUU occupancy (insn's)
ruu_rate                     1.6933 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3574 # avg RUU occupant latency (cycle's)
ruu_full                     0.9873 # fraction of time (cycle's) RUU was full
LSQ_count                  64371409 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0817 # avg LSQ occupancy (insn's)
lsq_rate                     1.6933 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0010 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292868511 # total number of slip cycles
avg_sim_slip                13.6987 # the average slip between issue and retirement
bpred_bimod.lookups         1654336 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482585 # total number of accesses
il1.hits                   21482406 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944828 # total number of accesses
dl1.hits                    4940882 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       5072 # total number of hits
ul2.misses                      105 # total number of misses
ul2.replacements                 73 # total number of replacements
ul2.writebacks                   31 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0203 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0141 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0060 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482585 # total number of accesses
itlb.hits                  21482579 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884008 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:8:2048:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:43:56 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:8:2048:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27860701 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203831 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481846 # total number of branches executed
sim_cycle                  40514155 # total simulation time in cycles
sim_IPC                      0.6877 # instructions per cycle
sim_CPI                      1.4542 # cycles per instruction
sim_exec_BW                  0.6877 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 161827526 # cumulative IFQ occupancy
IFQ_fcount                 40456632 # cumulative IFQ full count
ifq_occupancy                3.9943 # avg IFQ occupancy (insn's)
ifq_rate                     0.6877 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.8085 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9986 # fraction of time (cycle's) IFQ was full
RUU_count                 647313329 # cumulative RUU occupancy
RUU_fcount                 40456064 # cumulative RUU full count
ruu_occupancy               15.9775 # avg RUU occupancy (insn's)
ruu_rate                     0.6877 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 23.2339 # avg RUU occupant latency (cycle's)
ruu_full                     0.9986 # fraction of time (cycle's) RUU was full
LSQ_count                 194792906 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8080 # avg LSQ occupancy (insn's)
lsq_rate                     0.6877 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.9917 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  878615137 # total number of slip cycles
avg_sim_slip                31.5371 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860784 # total number of accesses
il1.hits                   27860573 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5817932 # total number of hits
dl1.misses                  2835466 # total number of misses
dl1.replacements            2834954 # total number of replacements
dl1.writebacks               955047 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3277 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3276 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3790724 # total number of accesses
ul2.hits                    3787350 # total number of hits
ul2.misses                     3374 # total number of misses
ul2.replacements               3342 # total number of replacements
ul2.writebacks                 1593 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0009 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0009 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0004 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860784 # total number of accesses
itlb.hits                  27860778 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017740 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 matrix 

sim: simulation started @ Thu Dec 15 12:44:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148925 # total number of instructions executed
sim_total_refs              4034187 # total number of loads and stores executed
sim_total_loads             3020630 # total number of loads executed
sim_total_stores       1013557.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   8556082 # total simulation time in cycles
sim_IPC                      1.5319 # instructions per cycle
sim_CPI                      0.6528 # cycles per instruction
sim_exec_BW                  1.5368 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  31937541 # cumulative IFQ occupancy
IFQ_fcount                  7836692 # cumulative IFQ full count
ifq_occupancy                3.7327 # avg IFQ occupancy (insn's)
ifq_rate                     1.5368 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4289 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9159 # fraction of time (cycle's) IFQ was full
RUU_count                 132151415 # cumulative RUU occupancy
RUU_fcount                  7154720 # cumulative RUU full count
ruu_occupancy               15.4453 # avg RUU occupancy (insn's)
ruu_rate                     1.5368 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.0504 # avg RUU occupant latency (cycle's)
ruu_full                     0.8362 # fraction of time (cycle's) RUU was full
LSQ_count                  40286813 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.7086 # avg LSQ occupancy (insn's)
lsq_rate                     1.5368 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0639 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  189481112 # total number of slip cycles
avg_sim_slip                14.4563 # the average slip between issue and retirement
bpred_bimod.lookups         1010974 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189424 # total number of accesses
il1.hits                   13189246 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013634 # total number of accesses
dl1.hits                    3667592 # total number of hits
dl1.misses                   346042 # total number of misses
dl1.replacements             345530 # total number of replacements
dl1.writebacks                  632 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0862 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0861 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 346852 # total number of accesses
ul2.hits                     346805 # total number of hits
ul2.misses                       47 # total number of misses
ul2.replacements                 31 # total number of replacements
ul2.writebacks                   10 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0001 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0001 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189424 # total number of accesses
itlb.hits                  13189418 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013775 # total number of accesses
dtlb.hits                   4013739 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696526 # total first level page table misses
mem.ptab_accesses          77357190 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 sort 

sim: simulation started @ Thu Dec 15 12:44:29 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12328464 # total number of instructions executed
sim_total_refs              4824023 # total number of loads and stores executed
sim_total_loads             2865548 # total number of loads executed
sim_total_stores       1958475.0000 # total number of stores executed
sim_total_branches          3197078 # total number of branches executed
sim_cycle                   7190390 # total simulation time in cycles
sim_IPC                      1.6107 # instructions per cycle
sim_CPI                      0.6209 # cycles per instruction
sim_exec_BW                  1.7146 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  21767450 # cumulative IFQ occupancy
IFQ_fcount                  4613528 # cumulative IFQ full count
ifq_occupancy                3.0273 # avg IFQ occupancy (insn's)
ifq_rate                     1.7146 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.7656 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6416 # fraction of time (cycle's) IFQ was full
RUU_count                  89240117 # cumulative RUU occupancy
RUU_fcount                  3993722 # cumulative RUU full count
ruu_occupancy               12.4110 # avg RUU occupancy (insn's)
ruu_rate                     1.7146 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.2385 # avg RUU occupant latency (cycle's)
ruu_full                     0.5554 # fraction of time (cycle's) RUU was full
LSQ_count                  38524437 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.3578 # avg LSQ occupancy (insn's)
lsq_rate                     1.7146 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.1248 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  139098064 # total number of slip cycles
avg_sim_slip                12.0104 # the average slip between issue and retirement
bpred_bimod.lookups         3257805 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984762 # total number of address-predicted hits
bpred_bimod.dir_hits        2990775 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137689 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       442078 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435455 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820503 # total number of accesses
il1.hits                   12820286 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497831 # total number of accesses
dl1.hits                    4480755 # total number of hits
dl1.misses                    17076 # total number of misses
dl1.replacements              16564 # total number of replacements
dl1.writebacks                10359 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0038 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0037 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0023 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  27652 # total number of accesses
ul2.hits                      27337 # total number of hits
ul2.misses                      315 # total number of misses
ul2.replacements                299 # total number of replacements
ul2.writebacks                  174 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0114 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0108 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0063 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820503 # total number of accesses
itlb.hits                  12820496 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514859 # total number of accesses
dtlb.hits                   4514793 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85919192 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 fft 

sim: simulation started @ Thu Dec 15 12:44:37 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 93 # total simulation time in seconds
sim_inst_rate           143235.5699 # simulation speed (in insts/sec)
sim_total_insn             13395729 # total number of instructions executed
sim_total_refs              6748283 # total number of loads and stores executed
sim_total_loads             3824239 # total number of loads executed
sim_total_stores       2924044.0000 # total number of stores executed
sim_total_branches           390630 # total number of branches executed
sim_cycle                 216855819 # total simulation time in cycles
sim_IPC                      0.0614 # instructions per cycle
sim_CPI                     16.2794 # cycles per instruction
sim_exec_BW                  0.0618 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                 865566875 # cumulative IFQ occupancy
IFQ_fcount                216241399 # cumulative IFQ full count
ifq_occupancy                3.9914 # avg IFQ occupancy (insn's)
ifq_rate                     0.0618 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                 64.6151 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9972 # fraction of time (cycle's) IFQ was full
RUU_count                3463764751 # cumulative RUU occupancy
RUU_fcount                216082981 # cumulative RUU full count
ruu_occupancy               15.9727 # avg RUU occupancy (insn's)
ruu_rate                     0.0618 # avg RUU dispatch rate (insn/cycle)
ruu_latency                258.5723 # avg RUU occupant latency (cycle's)
ruu_full                     0.9964 # fraction of time (cycle's) RUU was full
LSQ_count                1889922788 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.7151 # avg LSQ occupancy (insn's)
lsq_rate                     0.0618 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                141.0840 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                 5373208550 # total number of slip cycles
avg_sim_slip               403.3665 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89877 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400044 # total number of accesses
il1.hits                   13399298 # total number of hits
il1.misses                      746 # total number of misses
il1.replacements                250 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6172430 # total number of accesses
dl1.hits                    5867632 # total number of hits
dl1.misses                   304798 # total number of misses
dl1.replacements             304286 # total number of replacements
dl1.writebacks               156804 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0494 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0493 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0254 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 462348 # total number of accesses
ul2.hits                     392839 # total number of hits
ul2.misses                    69509 # total number of misses
ul2.replacements              69493 # total number of replacements
ul2.writebacks                55871 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.1503 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.1503 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.1208 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400044 # total number of accesses
itlb.hits                  13400025 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736824 # total number of hits
dtlb.misses                    4174 # total number of misses
dtlb.replacements              4046 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156765868 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 filter 

sim: simulation started @ Thu Dec 15 12:46:10 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21449676 # total number of instructions executed
sim_total_refs              6588956 # total number of loads and stores executed
sim_total_loads             4938903 # total number of loads executed
sim_total_stores       1650053.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12702351 # total simulation time in cycles
sim_IPC                      1.6831 # instructions per cycle
sim_CPI                      0.5941 # cycles per instruction
sim_exec_BW                  1.6886 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49473358 # cumulative IFQ occupancy
IFQ_fcount                 11748412 # cumulative IFQ full count
ifq_occupancy                3.8948 # avg IFQ occupancy (insn's)
ifq_rate                     1.6886 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3065 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9249 # fraction of time (cycle's) IFQ was full
RUU_count                 201022199 # cumulative RUU occupancy
RUU_fcount                 12525273 # cumulative RUU full count
ruu_occupancy               15.8256 # avg RUU occupancy (insn's)
ruu_rate                     1.6886 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3718 # avg RUU occupant latency (cycle's)
ruu_full                     0.9861 # fraction of time (cycle's) RUU was full
LSQ_count                  64482316 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0764 # avg LSQ occupancy (insn's)
lsq_rate                     1.6886 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  3.0062 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  293288492 # total number of slip cycles
avg_sim_slip                13.7184 # the average slip between issue and retirement
bpred_bimod.lookups         1654336 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638965 # total number of address-predicted hits
bpred_bimod.dir_hits        1639057 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8285 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           58 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21482585 # total number of accesses
il1.hits                   21482406 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4944844 # total number of accesses
dl1.hits                    4940898 # total number of hits
dl1.misses                     3946 # total number of misses
dl1.replacements               3434 # total number of replacements
dl1.writebacks                 1052 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0008 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0007 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   5177 # total number of accesses
ul2.hits                       5119 # total number of hits
ul2.misses                       58 # total number of misses
ul2.replacements                 42 # total number of replacements
ul2.writebacks                   16 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0112 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0081 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0031 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21482585 # total number of accesses
itlb.hits                  21482579 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565558 # total number of accesses
dtlb.hits                   6565519 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178884008 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:256:64:2:l -cache:il1 il1:256:64:2:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:maxBurstLength 8 -cache:dl2 ul2:4:4096:4:l -mem:width 8 -mem:lat 69 2 -mem:minBurstLength 4 -redir:sim tempOutput2 alphaBlend 

sim: simulation started @ Thu Dec 15 12:46:23 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput2 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:256:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:4:4096:4:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:256:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         69 2 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                  8 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 25 # total simulation time in seconds
sim_inst_rate          1114387.6800 # simulation speed (in insts/sec)
sim_total_insn             27860703 # total number of instructions executed
sim_total_refs              8653603 # total number of loads and stores executed
sim_total_loads             7203831 # total number of loads executed
sim_total_stores       1449772.0000 # total number of stores executed
sim_total_branches           481846 # total number of branches executed
sim_cycle                  42365531 # total simulation time in cycles
sim_IPC                      0.6576 # instructions per cycle
sim_CPI                      1.5207 # cycles per instruction
sim_exec_BW                  0.6576 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 169211302 # cumulative IFQ occupancy
IFQ_fcount                 42302577 # cumulative IFQ full count
ifq_occupancy                3.9941 # avg IFQ occupancy (insn's)
ifq_rate                     0.6576 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  6.0735 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9985 # fraction of time (cycle's) IFQ was full
RUU_count                 676851278 # cumulative RUU occupancy
RUU_fcount                 42302020 # cumulative RUU full count
ruu_occupancy               15.9765 # avg RUU occupancy (insn's)
ruu_rate                     0.6576 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 24.2941 # avg RUU occupant latency (cycle's)
ruu_full                     0.9985 # fraction of time (cycle's) RUU was full
LSQ_count                 204398267 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8246 # avg LSQ occupancy (insn's)
lsq_rate                     0.6576 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  7.3364 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  917629070 # total number of slip cycles
avg_sim_slip                32.9375 # the average slip between issue and retirement
bpred_bimod.lookups          481885 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481470 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              71 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8765 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          121 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           71 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8875 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860788 # total number of accesses
il1.hits                   27860577 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    5809162 # total number of hits
dl1.misses                  2844236 # total number of misses
dl1.replacements            2843724 # total number of replacements
dl1.writebacks               954932 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.3287 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.3286 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1104 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3799379 # total number of accesses
ul2.hits                    3797063 # total number of hits
ul2.misses                     2316 # total number of misses
ul2.replacements               2300 # total number of replacements
ul2.writebacks                 1251 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0006 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0006 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0003 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860788 # total number of accesses
itlb.hits                  27860782 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653411 # total number of accesses
dtlb.hits                   8652338 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017756 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

