sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 96 3 -redir:sim tempOutput3 matrix 

sim: simulation started @ Thu Dec 15 11:53:04 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         96 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6865902 # total simulation time in cycles
sim_IPC                      1.9090 # instructions per cycle
sim_CPI                      0.5238 # cycles per instruction
sim_exec_BW                  1.9151 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25559056 # cumulative IFQ occupancy
IFQ_fcount                  6264161 # cumulative IFQ full count
ifq_occupancy                3.7226 # avg IFQ occupancy (insn's)
ifq_rate                     1.9151 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9438 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 105516767 # cumulative RUU occupancy
RUU_fcount                  5700322 # cumulative RUU full count
ruu_occupancy               15.3682 # avg RUU occupancy (insn's)
ruu_rate                     1.9151 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  8.0247 # avg RUU occupant latency (cycle's)
ruu_full                     0.8302 # fraction of time (cycle's) RUU was full
LSQ_count                  32262186 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6989 # avg LSQ occupancy (insn's)
lsq_rate                     1.9151 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4536 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  154829597 # total number of slip cycles
avg_sim_slip                11.8126 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357595 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 96 3 -redir:sim tempOutput3 sort 

sim: simulation started @ Thu Dec 15 11:53:13 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         96 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264797 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6429407 # total simulation time in cycles
sim_IPC                      1.8013 # instructions per cycle
sim_CPI                      0.5551 # cycles per instruction
sim_exec_BW                  1.9076 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  19005671 # cumulative IFQ occupancy
IFQ_fcount                  3923258 # cumulative IFQ full count
ifq_occupancy                2.9561 # avg IFQ occupancy (insn's)
ifq_rate                     1.9076 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5496 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6102 # fraction of time (cycle's) IFQ was full
RUU_count                  78269315 # cumulative RUU occupancy
RUU_fcount                  3321309 # cumulative RUU full count
ruu_occupancy               12.1736 # avg RUU occupancy (insn's)
ruu_rate                     1.9076 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3816 # avg RUU occupant latency (cycle's)
ruu_full                     0.5166 # fraction of time (cycle's) RUU was full
LSQ_count                  32352597 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0320 # avg LSQ occupancy (insn's)
lsq_rate                     1.9076 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6378 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  124689008 # total number of slip cycles
avg_sim_slip                10.7662 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917920 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 96 3 -redir:sim tempOutput3 fft 

sim: simulation started @ Thu Dec 15 11:53:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         96 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 12 # total simulation time in seconds
sim_inst_rate          1110075.6667 # simulation speed (in insts/sec)
sim_total_insn             13375305 # total number of instructions executed
sim_total_refs              6748323 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924023.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9800234 # total simulation time in cycles
sim_IPC                      1.3592 # instructions per cycle
sim_CPI                      0.7357 # cycles per instruction
sim_exec_BW                  1.3648 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  38162477 # cumulative IFQ occupancy
IFQ_fcount                  9389975 # cumulative IFQ full count
ifq_occupancy                3.8940 # avg IFQ occupancy (insn's)
ifq_rate                     1.3648 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.8532 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9581 # fraction of time (cycle's) IFQ was full
RUU_count                 153568081 # cumulative RUU occupancy
RUU_fcount                  9249010 # cumulative RUU full count
ruu_occupancy               15.6698 # avg RUU occupancy (insn's)
ruu_rate                     1.3648 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 11.4815 # avg RUU occupant latency (cycle's)
ruu_full                     0.9438 # fraction of time (cycle's) RUU was full
LSQ_count                  81157701 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2812 # avg LSQ occupancy (insn's)
lsq_rate                     1.3648 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.0677 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  254440659 # total number of slip cycles
avg_sim_slip                19.1008 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400097 # total number of accesses
il1.hits                   13399370 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400097 # total number of accesses
itlb.hits                  13400078 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766196 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 96 3 -redir:sim tempOutput3 filter 

sim: simulation started @ Thu Dec 15 11:53:32 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         96 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12630068 # total simulation time in cycles
sim_IPC                      1.6927 # instructions per cycle
sim_CPI                      0.5908 # cycles per instruction
sim_exec_BW                  1.6983 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49353926 # cumulative IFQ occupancy
IFQ_fcount                 11719309 # cumulative IFQ full count
ifq_occupancy                3.9077 # avg IFQ occupancy (insn's)
ifq_rate                     1.6983 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.3009 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 200553458 # cumulative RUU occupancy
RUU_fcount                 12499184 # cumulative RUU full count
ruu_occupancy               15.8790 # avg RUU occupancy (insn's)
ruu_rate                     1.6983 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3498 # avg RUU occupant latency (cycle's)
ruu_full                     0.9896 # fraction of time (cycle's) RUU was full
LSQ_count                  64288751 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0901 # avg LSQ occupancy (insn's)
lsq_rate                     1.6983 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9971 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  292624872 # total number of slip cycles
avg_sim_slip                13.6873 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886466 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 96 3 -redir:sim tempOutput3 alphaBlend 

sim: simulation started @ Thu Dec 15 11:53:46 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         96 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 25 # total simulation time in seconds
sim_inst_rate          1114387.6800 # simulation speed (in insts/sec)
sim_total_insn             27861435 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  38654010 # total simulation time in cycles
sim_IPC                      0.7207 # instructions per cycle
sim_CPI                      1.3875 # cycles per instruction
sim_exec_BW                  0.7208 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 154520825 # cumulative IFQ occupancy
IFQ_fcount                 38629966 # cumulative IFQ full count
ifq_occupancy                3.9975 # avg IFQ occupancy (insn's)
ifq_rate                     0.7208 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.5460 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9994 # fraction of time (cycle's) IFQ was full
RUU_count                 618087023 # cumulative RUU occupancy
RUU_fcount                 38629184 # cumulative RUU full count
ruu_occupancy               15.9902 # avg RUU occupancy (insn's)
ruu_rate                     0.7208 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 22.1843 # avg RUU occupant latency (cycle's)
ruu_full                     0.9994 # fraction of time (cycle's) RUU was full
LSQ_count                 187844902 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8596 # avg LSQ occupancy (insn's)
lsq_rate                     0.7208 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.7421 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  842440492 # total number of slip cycles
avg_sim_slip                30.2387 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017736 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -redir:sim tempOutput3 matrix 

sim: simulation started @ Thu Dec 15 11:54:11 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6818790 # total simulation time in cycles
sim_IPC                      1.9222 # instructions per cycle
sim_CPI                      0.5202 # cycles per instruction
sim_exec_BW                  1.9283 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25387408 # cumulative IFQ occupancy
IFQ_fcount                  6221249 # cumulative IFQ full count
ifq_occupancy                3.7232 # avg IFQ occupancy (insn's)
ifq_rate                     1.9283 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9307 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104828303 # cumulative RUU occupancy
RUU_fcount                  5657410 # cumulative RUU full count
ruu_occupancy               15.3734 # avg RUU occupancy (insn's)
ruu_rate                     1.9283 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9723 # avg RUU occupant latency (cycle's)
ruu_full                     0.8297 # fraction of time (cycle's) RUU was full
LSQ_count                  32032170 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6976 # avg LSQ occupancy (insn's)
lsq_rate                     1.9283 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4361 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  153911549 # total number of slip cycles
avg_sim_slip                11.7426 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357595 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -redir:sim tempOutput3 sort 

sim: simulation started @ Thu Dec 15 11:54:19 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264701 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6364079 # total simulation time in cycles
sim_IPC                      1.8198 # instructions per cycle
sim_CPI                      0.5495 # cycles per instruction
sim_exec_BW                  1.9272 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18764807 # cumulative IFQ occupancy
IFQ_fcount                  3863042 # cumulative IFQ full count
ifq_occupancy                2.9486 # avg IFQ occupancy (insn's)
ifq_rate                     1.9272 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5300 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6070 # fraction of time (cycle's) IFQ was full
RUU_count                  77305303 # cumulative RUU occupancy
RUU_fcount                  3261117 # cumulative RUU full count
ruu_occupancy               12.1471 # avg RUU occupancy (insn's)
ruu_rate                     1.9272 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.3031 # avg RUU occupant latency (cycle's)
ruu_full                     0.5124 # fraction of time (cycle's) RUU was full
LSQ_count                  32135197 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0495 # avg LSQ occupancy (insn's)
lsq_rate                     1.9272 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6201 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  123507596 # total number of slip cycles
avg_sim_slip                10.6642 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917920 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -redir:sim tempOutput3 fft 

sim: simulation started @ Thu Dec 15 11:54:27 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13375116 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   9349714 # total simulation time in cycles
sim_IPC                      1.4247 # instructions per cycle
sim_CPI                      0.7019 # cycles per instruction
sim_exec_BW                  1.4305 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  36428943 # cumulative IFQ occupancy
IFQ_fcount                  8956591 # cumulative IFQ full count
ifq_occupancy                3.8963 # avg IFQ occupancy (insn's)
ifq_rate                     1.4305 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.7236 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9580 # fraction of time (cycle's) IFQ was full
RUU_count                 146628252 # cumulative RUU occupancy
RUU_fcount                  8815672 # cumulative RUU full count
ruu_occupancy               15.6826 # avg RUU occupancy (insn's)
ruu_rate                     1.4305 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.9628 # avg RUU occupant latency (cycle's)
ruu_full                     0.9429 # fraction of time (cycle's) RUU was full
LSQ_count                  77027739 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.2385 # avg LSQ occupancy (insn's)
lsq_rate                     1.4305 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.7590 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  243371337 # total number of slip cycles
avg_sim_slip                18.2699 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766208 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -redir:sim tempOutput3 filter 

sim: simulation started @ Thu Dec 15 11:54:38 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12576164 # total simulation time in cycles
sim_IPC                      1.7000 # instructions per cycle
sim_CPI                      0.5882 # cycles per instruction
sim_exec_BW                  1.7056 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  49155014 # cumulative IFQ occupancy
IFQ_fcount                 11669581 # cumulative IFQ full count
ifq_occupancy                3.9086 # avg IFQ occupancy (insn's)
ifq_rate                     1.7056 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2916 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 199756634 # cumulative RUU occupancy
RUU_fcount                 12449456 # cumulative RUU full count
ruu_occupancy               15.8837 # avg RUU occupancy (insn's)
ruu_rate                     1.7056 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.3126 # avg RUU occupant latency (cycle's)
ruu_full                     0.9899 # fraction of time (cycle's) RUU was full
LSQ_count                  64038863 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0921 # avg LSQ occupancy (insn's)
lsq_rate                     1.7056 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9855 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  291578304 # total number of slip cycles
avg_sim_slip                13.6384 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886466 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 72 3 -redir:sim tempOutput3 alphaBlend 

sim: simulation started @ Thu Dec 15 11:54:52 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         72 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861243 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  37568514 # total simulation time in cycles
sim_IPC                      0.7416 # instructions per cycle
sim_CPI                      1.3485 # cycles per instruction
sim_exec_BW                  0.7416 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 150198905 # cumulative IFQ occupancy
IFQ_fcount                 37549486 # cumulative IFQ full count
ifq_occupancy                3.9980 # avg IFQ occupancy (insn's)
ifq_rate                     0.7416 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.3910 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9995 # fraction of time (cycle's) IFQ was full
RUU_count                 600798311 # cumulative RUU occupancy
RUU_fcount                 37548752 # cumulative RUU full count
ruu_occupancy               15.9921 # avg RUU occupancy (insn's)
ruu_rate                     0.7416 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 21.5639 # avg RUU occupant latency (cycle's)
ruu_full                     0.9995 # fraction of time (cycle's) RUU was full
LSQ_count                 182621278 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8610 # avg LSQ occupancy (insn's)
lsq_rate                     0.7416 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.5547 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  819928252 # total number of slip cycles
avg_sim_slip                29.4306 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017736 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 48 3 -redir:sim tempOutput3 matrix 

sim: simulation started @ Thu Dec 15 11:55:16 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         48 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6771678 # total simulation time in cycles
sim_IPC                      1.9356 # instructions per cycle
sim_CPI                      0.5166 # cycles per instruction
sim_exec_BW                  1.9418 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25215760 # cumulative IFQ occupancy
IFQ_fcount                  6178337 # cumulative IFQ full count
ifq_occupancy                3.7237 # avg IFQ occupancy (insn's)
ifq_rate                     1.9418 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9177 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 104139839 # cumulative RUU occupancy
RUU_fcount                  5614498 # cumulative RUU full count
ruu_occupancy               15.3787 # avg RUU occupancy (insn's)
ruu_rate                     1.9418 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.9200 # avg RUU occupant latency (cycle's)
ruu_full                     0.8291 # fraction of time (cycle's) RUU was full
LSQ_count                  31802154 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6963 # avg LSQ occupancy (insn's)
lsq_rate                     1.9418 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4186 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152993501 # total number of slip cycles
avg_sim_slip                11.6725 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357595 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 48 3 -redir:sim tempOutput3 sort 

sim: simulation started @ Thu Dec 15 11:55:24 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         48 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264605 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6298751 # total simulation time in cycles
sim_IPC                      1.8387 # instructions per cycle
sim_CPI                      0.5439 # cycles per instruction
sim_exec_BW                  1.9471 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18523943 # cumulative IFQ occupancy
IFQ_fcount                  3802826 # cumulative IFQ full count
ifq_occupancy                2.9409 # avg IFQ occupancy (insn's)
ifq_rate                     1.9471 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.5104 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6037 # fraction of time (cycle's) IFQ was full
RUU_count                  76341343 # cumulative RUU occupancy
RUU_fcount                  3200925 # cumulative RUU full count
ruu_occupancy               12.1201 # avg RUU occupancy (insn's)
ruu_rate                     1.9471 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.2245 # avg RUU occupant latency (cycle's)
ruu_full                     0.5082 # fraction of time (cycle's) RUU was full
LSQ_count                  31917829 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0673 # avg LSQ occupancy (insn's)
lsq_rate                     1.9471 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.6024 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  122326268 # total number of slip cycles
avg_sim_slip                10.5622 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820215 # total number of accesses
il1.hits                   12819998 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820215 # total number of accesses
itlb.hits                  12820208 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917920 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 48 3 -redir:sim tempOutput3 fft 

sim: simulation started @ Thu Dec 15 11:55:32 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         48 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13374924 # total number of instructions executed
sim_total_refs              6748325 # total number of loads and stores executed
sim_total_loads             3824300 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   8899217 # total simulation time in cycles
sim_IPC                      1.4969 # instructions per cycle
sim_CPI                      0.6681 # cycles per instruction
sim_exec_BW                  1.5029 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  34695504 # cumulative IFQ occupancy
IFQ_fcount                  8523231 # cumulative IFQ full count
ifq_occupancy                3.8987 # avg IFQ occupancy (insn's)
ifq_rate                     1.5029 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.5941 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9578 # fraction of time (cycle's) IFQ was full
RUU_count                 139688835 # cumulative RUU occupancy
RUU_fcount                  8382359 # cumulative RUU full count
ruu_occupancy               15.6968 # avg RUU occupancy (insn's)
ruu_rate                     1.5029 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 10.4441 # avg RUU occupant latency (cycle's)
ruu_full                     0.9419 # fraction of time (cycle's) RUU was full
LSQ_count                  72897924 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1915 # avg LSQ occupancy (insn's)
lsq_rate                     1.5029 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.4503 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  232302585 # total number of slip cycles
avg_sim_slip                17.4389 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400100 # total number of accesses
il1.hits                   13399373 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400100 # total number of accesses
itlb.hits                  13400081 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766208 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 48 3 -redir:sim tempOutput3 filter 

sim: simulation started @ Thu Dec 15 11:55:43 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         48 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12522260 # total simulation time in cycles
sim_IPC                      1.7073 # instructions per cycle
sim_CPI                      0.5857 # cycles per instruction
sim_exec_BW                  1.7130 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48956102 # cumulative IFQ occupancy
IFQ_fcount                 11619853 # cumulative IFQ full count
ifq_occupancy                3.9095 # avg IFQ occupancy (insn's)
ifq_rate                     1.7130 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2823 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9279 # fraction of time (cycle's) IFQ was full
RUU_count                 198959810 # cumulative RUU occupancy
RUU_fcount                 12399728 # cumulative RUU full count
ruu_occupancy               15.8885 # avg RUU occupancy (insn's)
ruu_rate                     1.7130 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2755 # avg RUU occupant latency (cycle's)
ruu_full                     0.9902 # fraction of time (cycle's) RUU was full
LSQ_count                  63788975 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0940 # avg LSQ occupancy (insn's)
lsq_rate                     1.7130 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9738 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  290531736 # total number of slip cycles
avg_sim_slip                13.5894 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483054 # total number of accesses
il1.hits                   21482875 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483054 # total number of accesses
itlb.hits                  21483048 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886466 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 48 3 -redir:sim tempOutput3 alphaBlend 

sim: simulation started @ Thu Dec 15 11:55:56 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         48 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 24 # total simulation time in seconds
sim_inst_rate          1160820.5000 # simulation speed (in insts/sec)
sim_total_insn             27861051 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  36483018 # total simulation time in cycles
sim_IPC                      0.7636 # instructions per cycle
sim_CPI                      1.3095 # cycles per instruction
sim_exec_BW                  0.7637 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 145876985 # cumulative IFQ occupancy
IFQ_fcount                 36469006 # cumulative IFQ full count
ifq_occupancy                3.9985 # avg IFQ occupancy (insn's)
ifq_rate                     0.7637 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.2359 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9996 # fraction of time (cycle's) IFQ was full
RUU_count                 583509599 # cumulative RUU occupancy
RUU_fcount                 36468320 # cumulative RUU full count
ruu_occupancy               15.9940 # avg RUU occupancy (insn's)
ruu_rate                     0.7637 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.9436 # avg RUU occupant latency (cycle's)
ruu_full                     0.9996 # fraction of time (cycle's) RUU was full
LSQ_count                 177397654 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8625 # avg LSQ occupancy (insn's)
lsq_rate                     0.7637 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.3672 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  797416012 # total number of slip cycles
avg_sim_slip                28.6226 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860789 # total number of accesses
il1.hits                   27860578 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860789 # total number of accesses
itlb.hits                  27860783 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017736 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 24 3 -redir:sim tempOutput3 matrix 

sim: simulation started @ Thu Dec 15 11:56:20 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         24 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1638390.5000 # simulation speed (in insts/sec)
sim_total_insn             13148990 # total number of instructions executed
sim_total_refs              4034206 # total number of loads and stores executed
sim_total_loads             3020647 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6724566 # total simulation time in cycles
sim_IPC                      1.9491 # instructions per cycle
sim_CPI                      0.5130 # cycles per instruction
sim_exec_BW                  1.9554 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  25044112 # cumulative IFQ occupancy
IFQ_fcount                  6135425 # cumulative IFQ full count
ifq_occupancy                3.7243 # avg IFQ occupancy (insn's)
ifq_rate                     1.9554 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.9046 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 103451375 # cumulative RUU occupancy
RUU_fcount                  5571586 # cumulative RUU full count
ruu_occupancy               15.3841 # avg RUU occupancy (insn's)
ruu_rate                     1.9554 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.8676 # avg RUU occupant latency (cycle's)
ruu_full                     0.8285 # fraction of time (cycle's) RUU was full
LSQ_count                  31572138 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6950 # avg LSQ occupancy (insn's)
lsq_rate                     1.9554 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.4011 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  152075453 # total number of slip cycles
avg_sim_slip                11.6025 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000567 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189524 # total number of accesses
il1.hits                   13189346 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013174 # total number of accesses
dl1.hits                    4007447 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189524 # total number of accesses
itlb.hits                  13189518 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013778 # total number of accesses
dtlb.hits                   4013742 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357595 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 24 3 -redir:sim tempOutput3 sort 

sim: simulation started @ Thu Dec 15 11:56:28 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         24 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  8 # total simulation time in seconds
sim_inst_rate          1447689.1250 # simulation speed (in insts/sec)
sim_total_insn             12264509 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6233430 # total simulation time in cycles
sim_IPC                      1.8580 # instructions per cycle
sim_CPI                      0.5382 # cycles per instruction
sim_exec_BW                  1.9675 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18283079 # cumulative IFQ occupancy
IFQ_fcount                  3742610 # cumulative IFQ full count
ifq_occupancy                2.9331 # avg IFQ occupancy (insn's)
ifq_rate                     1.9675 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.4907 # avg IFQ occupant latency (cycle's)
ifq_full                     0.6004 # fraction of time (cycle's) IFQ was full
RUU_count                  75377392 # cumulative RUU occupancy
RUU_fcount                  3140736 # cumulative RUU full count
ruu_occupancy               12.0924 # avg RUU occupancy (insn's)
ruu_rate                     1.9675 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.1460 # avg RUU occupant latency (cycle's)
ruu_full                     0.5039 # fraction of time (cycle's) RUU was full
LSQ_count                  31700467 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0856 # avg LSQ occupancy (insn's)
lsq_rate                     1.9675 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5847 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  121144955 # total number of slip cycles
avg_sim_slip                10.4602 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820214 # total number of accesses
il1.hits                   12819997 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820214 # total number of accesses
itlb.hits                  12820207 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917916 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 24 3 -redir:sim tempOutput3 fft 

sim: simulation started @ Thu Dec 15 11:56:36 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         24 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13374738 # total number of instructions executed
sim_total_refs              6748328 # total number of loads and stores executed
sim_total_loads             3824303 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390628 # total number of branches executed
sim_cycle                   8448779 # total simulation time in cycles
sim_IPC                      1.5767 # instructions per cycle
sim_CPI                      0.6342 # cycles per instruction
sim_exec_BW                  1.5830 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  32962270 # cumulative IFQ occupancy
IFQ_fcount                  8089922 # cumulative IFQ full count
ifq_occupancy                3.9014 # avg IFQ occupancy (insn's)
ifq_rate                     1.5830 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4645 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9575 # fraction of time (cycle's) IFQ was full
RUU_count                 132750195 # cumulative RUU occupancy
RUU_fcount                  7949095 # cumulative RUU full count
ruu_occupancy               15.7124 # avg RUU occupancy (insn's)
ruu_rate                     1.5830 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.9254 # avg RUU occupant latency (cycle's)
ruu_full                     0.9409 # fraction of time (cycle's) RUU was full
LSQ_count                  68768392 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1394 # avg LSQ occupancy (insn's)
lsq_rate                     1.5830 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  5.1417 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  221234870 # total number of slip cycles
avg_sim_slip                16.6081 # the average slip between issue and retirement
bpred_bimod.lookups          390939 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400103 # total number of accesses
il1.hits                   13399376 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169439 # total number of accesses
dl1.hits                    5881717 # total number of hits
dl1.misses                   287722 # total number of misses
dl1.replacements             286698 # total number of replacements
dl1.writebacks               143444 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431893 # total number of accesses
ul2.hits                     399499 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400103 # total number of accesses
itlb.hits                  13400084 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736854 # total number of hits
dtlb.misses                    4144 # total number of misses
dtlb.replacements              4016 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766232 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 24 3 -redir:sim tempOutput3 filter 

sim: simulation started @ Thu Dec 15 11:56:47 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         24 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 13 # total simulation time in seconds
sim_inst_rate          1644558.1538 # simulation speed (in insts/sec)
sim_total_insn             21450114 # total number of instructions executed
sim_total_refs              6589108 # total number of loads and stores executed
sim_total_loads             4939053 # total number of loads executed
sim_total_stores       1650055.0000 # total number of stores executed
sim_total_branches          1647447 # total number of branches executed
sim_cycle                  12468359 # total simulation time in cycles
sim_IPC                      1.7147 # instructions per cycle
sim_CPI                      0.5832 # cycles per instruction
sim_exec_BW                  1.7204 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48757190 # cumulative IFQ occupancy
IFQ_fcount                 11570125 # cumulative IFQ full count
ifq_occupancy                3.9105 # avg IFQ occupancy (insn's)
ifq_rate                     1.7204 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2731 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9280 # fraction of time (cycle's) IFQ was full
RUU_count                 198162986 # cumulative RUU occupancy
RUU_fcount                 12350000 # cumulative RUU full count
ruu_occupancy               15.8933 # avg RUU occupancy (insn's)
ruu_rate                     1.7204 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2383 # avg RUU occupant latency (cycle's)
ruu_full                     0.9905 # fraction of time (cycle's) RUU was full
LSQ_count                  63539087 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0960 # avg LSQ occupancy (insn's)
lsq_rate                     1.7204 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9622 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  289485168 # total number of slip cycles
avg_sim_slip                13.5405 # the average slip between issue and retirement
bpred_bimod.lookups         1654486 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483053 # total number of accesses
il1.hits                   21482874 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483053 # total number of accesses
itlb.hits                  21483047 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886462 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 24 3 -redir:sim tempOutput3 alphaBlend 

sim: simulation started @ Thu Dec 15 11:57:00 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         24 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 23 # total simulation time in seconds
sim_inst_rate          1211290.9565 # simulation speed (in insts/sec)
sim_total_insn             27860859 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  35405001 # total simulation time in cycles
sim_IPC                      0.7869 # instructions per cycle
sim_CPI                      1.2708 # cycles per instruction
sim_exec_BW                  0.7869 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 141584953 # cumulative IFQ occupancy
IFQ_fcount                 35395998 # cumulative IFQ full count
ifq_occupancy                3.9990 # avg IFQ occupancy (insn's)
ifq_rate                     0.7869 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.0819 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9997 # fraction of time (cycle's) IFQ was full
RUU_count                 566340444 # cumulative RUU occupancy
RUU_fcount                 35395360 # cumulative RUU full count
ruu_occupancy               15.9961 # avg RUU occupancy (insn's)
ruu_rate                     0.7869 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.3275 # avg RUU occupant latency (cycle's)
ruu_full                     0.9997 # fraction of time (cycle's) RUU was full
LSQ_count                 172203922 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8638 # avg LSQ occupancy (insn's)
lsq_rate                     0.7869 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.1809 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  775053221 # total number of slip cycles
avg_sim_slip                27.8199 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860788 # total number of accesses
il1.hits                   27860577 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6081185 # total number of hits
dl1.misses                  2572213 # total number of misses
dl1.replacements            2571189 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2972 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2971 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3529698 # total number of accesses
ul2.hits                    3461377 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0170 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860788 # total number of accesses
itlb.hits                  27860782 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017732 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 12 3 -redir:sim tempOutput3 matrix 

sim: simulation started @ Thu Dec 15 11:57:23 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         12 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13107124 # total number of instructions committed
sim_num_refs                4013722 # total number of loads and stores committed
sim_num_loads               3000354 # total number of loads committed
sim_num_stores         1013368.0000 # total number of stores committed
sim_num_branches            1010850 # total number of branches committed
sim_elapsed_time                  9 # total simulation time in seconds
sim_inst_rate          1456347.1111 # simulation speed (in insts/sec)
sim_total_insn             13148988 # total number of instructions executed
sim_total_refs              4034205 # total number of loads and stores executed
sim_total_loads             3020646 # total number of loads executed
sim_total_stores       1013559.0000 # total number of stores executed
sim_total_branches          1010953 # total number of branches executed
sim_cycle                   6701330 # total simulation time in cycles
sim_IPC                      1.9559 # instructions per cycle
sim_CPI                      0.5113 # cycles per instruction
sim_exec_BW                  1.9621 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9664 # instruction per branch
IFQ_count                  24959110 # cumulative IFQ occupancy
IFQ_fcount                  6114168 # cumulative IFQ full count
ifq_occupancy                3.7245 # avg IFQ occupancy (insn's)
ifq_rate                     1.9621 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.8982 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9124 # fraction of time (cycle's) IFQ was full
RUU_count                 103110695 # cumulative RUU occupancy
RUU_fcount                  5550299 # cumulative RUU full count
ruu_occupancy               15.3866 # avg RUU occupancy (insn's)
ruu_rate                     1.9621 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  7.8417 # avg RUU occupant latency (cycle's)
ruu_full                     0.8282 # fraction of time (cycle's) RUU was full
LSQ_count                  31458284 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.6943 # avg LSQ occupancy (insn's)
lsq_rate                     1.9621 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.3924 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  151621167 # total number of slip cycles
avg_sim_slip                11.5678 # the average slip between issue and retirement
bpred_bimod.lookups         1010975 # total number of bpred lookups
bpred_bimod.updates         1010850 # total number of updates
bpred_bimod.addr_hits       1000566 # total number of address-predicted hits
bpred_bimod.dir_hits        1000659 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses            10191 # total number of misses
bpred_bimod.jr_hits              46 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9898 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9899 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8846 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           55 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           46 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8846 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13189520 # total number of accesses
il1.hits                   13189342 # total number of hits
il1.misses                      178 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4013172 # total number of accesses
dl1.hits                    4007445 # total number of hits
dl1.misses                     5727 # total number of misses
dl1.replacements               4703 # total number of replacements
dl1.writebacks                  492 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0014 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0012 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0001 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   6397 # total number of accesses
ul2.hits                       4129 # total number of hits
ul2.misses                     2268 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.3545 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13189520 # total number of accesses
itlb.hits                  13189514 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4013776 # total number of accesses
dtlb.hits                   4013740 # total number of hits
dtlb.misses                      36 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23136 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                 121104 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   24 # total number of pages allocated
mem.page_mem                    96k # total size of memory pages allocated
mem.ptab_misses             7696533 # total first level page table misses
mem.ptab_accesses          77357577 # total page table accesses
mem.ptab_miss_rate           0.0995 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 12 3 -redir:sim tempOutput3 sort 

sim: simulation started @ Thu Dec 15 11:57:32 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         12 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               11581513 # total number of instructions committed
sim_num_refs                4482821 # total number of loads and stores committed
sim_num_loads               2602463 # total number of loads committed
sim_num_stores         1880358.0000 # total number of stores committed
sim_num_branches            3128464 # total number of branches committed
sim_elapsed_time                  7 # total simulation time in seconds
sim_inst_rate          1654501.8571 # simulation speed (in insts/sec)
sim_total_insn             12264461 # total number of instructions executed
sim_total_refs              4823977 # total number of loads and stores executed
sim_total_loads             2865499 # total number of loads executed
sim_total_stores       1958478.0000 # total number of stores executed
sim_total_branches          3196948 # total number of branches executed
sim_cycle                   6201192 # total simulation time in cycles
sim_IPC                      1.8676 # instructions per cycle
sim_CPI                      0.5354 # cycles per instruction
sim_exec_BW                  1.9778 # total instructions (mis-spec + committed) per cycle
sim_IPB                      3.7020 # instruction per branch
IFQ_count                  18164045 # cumulative IFQ occupancy
IFQ_fcount                  3712851 # cumulative IFQ full count
ifq_occupancy                2.9291 # avg IFQ occupancy (insn's)
ifq_rate                     1.9778 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  1.4810 # avg IFQ occupant latency (cycle's)
ifq_full                     0.5987 # fraction of time (cycle's) IFQ was full
RUU_count                  74901152 # cumulative RUU occupancy
RUU_fcount                  3110990 # cumulative RUU full count
ruu_occupancy               12.0785 # avg RUU occupancy (insn's)
ruu_rate                     1.9778 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  6.1072 # avg RUU occupant latency (cycle's)
ruu_full                     0.5017 # fraction of time (cycle's) RUU was full
LSQ_count                  31593102 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0947 # avg LSQ occupancy (insn's)
lsq_rate                     1.9778 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.5760 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  120561350 # total number of slip cycles
avg_sim_slip                10.4098 # the average slip between issue and retirement
bpred_bimod.lookups         3257658 # total number of bpred lookups
bpred_bimod.updates         3128464 # total number of updates
bpred_bimod.addr_hits       2984765 # total number of address-predicted hits
bpred_bimod.dir_hits        2990777 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses           137687 # total number of misses
bpred_bimod.jr_hits          427904 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen          434901 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP         1223 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP         8193 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9541 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9560 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9839 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.1493 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes       441943 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops       435443 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP       426708 # total number of RAS predictions used
bpred_bimod.ras_hits.PP       426681 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               12820213 # total number of accesses
il1.hits                   12819996 # total number of hits
il1.misses                      217 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4497784 # total number of accesses
dl1.hits                    4486579 # total number of hits
dl1.misses                    11205 # total number of misses
dl1.replacements              10181 # total number of replacements
dl1.writebacks                 7079 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0025 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0023 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0016 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                  18501 # total number of accesses
ul2.hits                      14305 # total number of hits
ul2.misses                     4196 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.2268 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              12820213 # total number of accesses
itlb.hits                  12820206 # total number of hits
itlb.misses                       7 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               4514759 # total number of accesses
dtlb.hits                   4514693 # total number of hits
dtlb.misses                      66 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  27072 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   73 # total number of pages allocated
mem.page_mem                   292k # total size of memory pages allocated
mem.ptab_misses                 589 # total first level page table misses
mem.ptab_accesses          85917912 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 12 3 -redir:sim tempOutput3 fft 

sim: simulation started @ Thu Dec 15 11:57:39 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         12 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               13320908 # total number of instructions committed
sim_num_refs                6722956 # total number of loads and stores committed
sim_num_loads               3799918 # total number of loads committed
sim_num_stores         2923038.0000 # total number of stores committed
sim_num_branches             387182 # total number of branches committed
sim_elapsed_time                 11 # total simulation time in seconds
sim_inst_rate          1210991.6364 # simulation speed (in insts/sec)
sim_total_insn             13374658 # total number of instructions executed
sim_total_refs              6748328 # total number of loads and stores executed
sim_total_loads             3824303 # total number of loads executed
sim_total_stores       2924025.0000 # total number of stores executed
sim_total_branches           390629 # total number of branches executed
sim_cycle                   8230187 # total simulation time in cycles
sim_IPC                      1.6185 # instructions per cycle
sim_CPI                      0.6178 # cycles per instruction
sim_exec_BW                  1.6251 # total instructions (mis-spec + committed) per cycle
sim_IPB                     34.4048 # instruction per branch
IFQ_count                  32121422 # cumulative IFQ occupancy
IFQ_fcount                  7879675 # cumulative IFQ full count
ifq_occupancy                3.9029 # avg IFQ occupancy (insn's)
ifq_rate                     1.6251 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.4017 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9574 # fraction of time (cycle's) IFQ was full
RUU_count                 129385116 # cumulative RUU occupancy
RUU_fcount                  7738813 # cumulative RUU full count
ruu_occupancy               15.7208 # avg RUU occupancy (insn's)
ruu_rate                     1.6251 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.6739 # avg RUU occupant latency (cycle's)
ruu_full                     0.9403 # fraction of time (cycle's) RUU was full
LSQ_count                  66758970 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                8.1115 # avg LSQ occupancy (insn's)
lsq_rate                     1.6251 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  4.9915 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  215860517 # total number of slip cycles
avg_sim_slip                16.2046 # the average slip between issue and retirement
bpred_bimod.lookups          390940 # total number of bpred lookups
bpred_bimod.updates          387182 # total number of updates
bpred_bimod.addr_hits        380510 # total number of address-predicted hits
bpred_bimod.dir_hits         380716 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             6466 # total number of misses
bpred_bimod.jr_hits           89850 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen           89860 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9828 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9833 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.9999 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes        90460 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops        89878 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP        89859 # total number of RAS predictions used
bpred_bimod.ras_hits.PP        89850 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9999 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               13400103 # total number of accesses
il1.hits                   13399376 # total number of hits
il1.misses                      727 # total number of misses
il1.replacements                 30 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0001 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                6169502 # total number of accesses
dl1.hits                    5881789 # total number of hits
dl1.misses                   287713 # total number of misses
dl1.replacements             286689 # total number of replacements
dl1.writebacks               143443 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0466 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0465 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0233 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                 431883 # total number of accesses
ul2.hits                     399489 # total number of hits
ul2.misses                    32394 # total number of misses
ul2.replacements              24202 # total number of replacements
ul2.writebacks                19002 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0750 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0560 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0440 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              13400103 # total number of accesses
itlb.hits                  13400084 # total number of hits
itlb.misses                      19 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6740998 # total number of accesses
dtlb.hits                   6736793 # total number of hits
dtlb.misses                    4205 # total number of misses
dtlb.replacements              4077 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0006 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0006 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  89248 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  190 # total number of pages allocated
mem.page_mem                   760k # total size of memory pages allocated
mem.ptab_misses                3262 # total first level page table misses
mem.ptab_accesses         156766232 # total page table accesses
mem.ptab_miss_rate           0.0000 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 12 3 -redir:sim tempOutput3 filter 

sim: simulation started @ Thu Dec 15 11:57:50 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         12 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               21379256 # total number of instructions committed
sim_num_refs                6565514 # total number of loads and stores committed
sim_num_loads               4915554 # total number of loads committed
sim_num_stores         1649960.0000 # total number of stores committed
sim_num_branches            1647342 # total number of branches committed
sim_elapsed_time                 14 # total simulation time in seconds
sim_inst_rate          1527089.7143 # simulation speed (in insts/sec)
sim_total_insn             21450119 # total number of instructions executed
sim_total_refs              6589112 # total number of loads and stores executed
sim_total_loads             4939056 # total number of loads executed
sim_total_stores       1650056.0000 # total number of stores executed
sim_total_branches          1647448 # total number of branches executed
sim_cycle                  12441750 # total simulation time in cycles
sim_IPC                      1.7183 # instructions per cycle
sim_CPI                      0.5820 # cycles per instruction
sim_exec_BW                  1.7240 # total instructions (mis-spec + committed) per cycle
sim_IPB                     12.9780 # instruction per branch
IFQ_count                  48658851 # cumulative IFQ occupancy
IFQ_fcount                 11545540 # cumulative IFQ full count
ifq_occupancy                3.9109 # avg IFQ occupancy (insn's)
ifq_rate                     1.7240 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  2.2685 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9280 # fraction of time (cycle's) IFQ was full
RUU_count                 197769168 # cumulative RUU occupancy
RUU_fcount                 12325414 # cumulative RUU full count
ruu_occupancy               15.8956 # avg RUU occupancy (insn's)
ruu_rate                     1.7240 # avg RUU dispatch rate (insn/cycle)
ruu_latency                  9.2200 # avg RUU occupant latency (cycle's)
ruu_full                     0.9906 # fraction of time (cycle's) RUU was full
LSQ_count                  63415583 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                5.0970 # avg LSQ occupancy (insn's)
lsq_rate                     1.7240 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  2.9564 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  288967908 # total number of slip cycles
avg_sim_slip                13.5163 # the average slip between issue and retirement
bpred_bimod.lookups         1654488 # total number of bpred lookups
bpred_bimod.updates         1647342 # total number of updates
bpred_bimod.addr_hits       1638967 # total number of address-predicted hits
bpred_bimod.dir_hits        1639058 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses             8284 # total number of misses
bpred_bimod.jr_hits              45 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              52 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            0 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9949 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9950 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8654 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP <error: divide by zero> # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes           79 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           57 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           52 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           45 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.8654 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               21483059 # total number of accesses
il1.hits                   21482880 # total number of hits
il1.misses                      179 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                4943189 # total number of accesses
dl1.hits                    4940058 # total number of hits
dl1.misses                     3131 # total number of misses
dl1.replacements               2107 # total number of replacements
dl1.writebacks                  791 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.0006 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.0004 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.0002 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                   4101 # total number of accesses
ul2.hits                       1612 # total number of hits
ul2.misses                     2489 # total number of misses
ul2.replacements                  0 # total number of replacements
ul2.writebacks                    0 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.6069 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              21483059 # total number of accesses
itlb.hits                  21483053 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               6565559 # total number of accesses
dtlb.hits                   6565520 # total number of hits
dtlb.misses                      39 # total number of misses
dtlb.replacements                 0 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23088 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                   30 # total number of pages allocated
mem.page_mem                   120k # total size of memory pages allocated
mem.ptab_misses            12303390 # total first level page table misses
mem.ptab_accesses         178886492 # total page table accesses
mem.ptab_miss_rate           0.0688 # first level page table miss rate

sim-outorder: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
Copyright (c) 1994-2003 by Todd M. Austin, Ph.D. and SimpleScalar, LLC.
All Rights Reserved. This version of SimpleScalar is licensed for academic
non-commercial use.  No portion of this work may be used by any commercial
entity, or for any commercial purpose, without the prior written permission
of SimpleScalar, LLC (info@simplescalar.com).

sim: command line: sim-outorder -fetch:mplat 8 -bpred:ras 16 -bpred:bimod 16384 -lsq:size 32 -res:imult 2 -res:fpmult 2 -cache:dl1 dl1:512:64:2:l -cache:il1 il1:512:64:2:l -cache:dl2 ul2:1024:64:8:l -cache:dl1lat 3 -cache:il1lat 3 -cache:il2lat 12 -cache:dl2lat 12 -mem:minBurstLength 4 -mem:maxBurstLength 8 -mem:width 16 -mem:lat 12 3 -redir:sim tempOutput3 alphaBlend 

sim: simulation started @ Thu Dec 15 11:58:04 2005, options follow:

sim-outorder: This simulator implements a very detailed out-of-order issue
superscalar processor with a two-level memory system and speculative
execution support.  This simulator is a performance simulator, tracking the
latency of all pipeline operations.

# -config                     # load configuration from a file
# -dumpconfig                 # dump configuration to a file
# -h                    false # print help message    
# -v                    false # verbose operation     
# -i                    false # start in Dlite debugger
-seed                       1 # random number generator seed (0 for timer seed)
# -q                    false # initialize and terminate immediately
# -chkpt               <null> # restore EIO trace execution from <fname>
# -redir:sim      tempOutput3 # redirect simulator output to file (non-interactive only)
# -redir:prog          <null> # redirect simulated program output to file
-nice                       0 # simulator scheduling priority
-max:inst                   0 # maximum number of inst's to execute
-fastfwd                    0 # number of insts skipped before timing starts
# -ptrace              <null> # generate pipetrace, i.e., <fname|stdout|stderr> <range>
-fetch:ifqsize              4 # instruction fetch queue size (in insts)
-fetch:mplat                8 # extra branch mis-prediction latency
-fetch:speed                1 # speed of front-end of machine relative to execution core
-bpred                  bimod # branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}
-bpred:bimod     16384 # bimodal predictor config (<table size>)
-bpred:2lev      1 1024 8 0 # 2-level predictor config (<l1size> <l2size> <hist_size> <xor>)
-bpred:comb      1024 # combining predictor config (<meta_table_size>)
-bpred:ras                 16 # return address stack size (0 for no return stack)
-bpred:btb       512 4 # BTB config (<num_sets> <associativity>)
# -bpred:spec_update       <null> # speculative predictors update in {ID|WB} (default non-spec)
-decode:width               4 # instruction decode B/W (insts/cycle)
-issue:width                4 # instruction issue B/W (insts/cycle)
-issue:inorder          false # run pipeline with in-order issue
-issue:wrongpath         true # issue instructions down wrong execution paths
-commit:width               4 # instruction commit B/W (insts/cycle)
-ruu:size                  16 # register update unit (RUU) size
-lsq:size                  32 # load/store queue (LSQ) size
-cache:dl1       dl1:512:64:2:l # l1 data cache config, i.e., {<config>|none}
-cache:dl1lat               3 # l1 data cache hit latency (in cycles)
-cache:dl2       ul2:1024:64:8:l # l2 data cache config, i.e., {<config>|none}
-cache:dl2lat              12 # l2 data cache hit latency (in cycles)
-cache:il1       il1:512:64:2:l # l1 inst cache config, i.e., {<config>|dl1|dl2|none}
-cache:il1lat               3 # l1 instruction cache hit latency (in cycles)
-cache:il2                dl2 # l2 instruction cache config, i.e., {<config>|dl2|none}
-cache:il2lat              12 # l2 instruction cache hit latency (in cycles)
-cache:flush            false # flush caches on system calls
-cache:icompress        false # convert 64-bit inst addresses to 32-bit inst equivalents
-mem:lat         12 3 # memory access latency (<first_chunk> <inter_chunk>)
-mem:width                 16 # memory access bus width (in bytes)
-mem:maxBurstLength            8 # maximum memory burst length (0 = infinite)
-mem:minBurstLength            4 # minimum memory burst length
-tlb:itlb        itlb:16:4096:4:l # instruction TLB config, i.e., {<config>|none}
-tlb:dtlb        dtlb:32:4096:4:l # data TLB config, i.e., {<config>|none}
-tlb:lat                   30 # inst/data TLB miss latency (in cycles)
-res:ialu                   4 # total number of integer ALU's available
-res:imult                  2 # total number of integer multiplier/dividers available
-res:memport                2 # total number of memory system ports available (to CPU)
-res:fpalu                  4 # total number of floating point ALU's available
-res:fpmult                 2 # total number of floating point multiplier/dividers available
# -pcstat              <null> # profile stat(s) against text addr's (mult uses ok)
-bugcompat              false # operate in backward-compatible bugs mode (for testing only)

  Pipetrace range arguments are formatted as follows:

    {{@|#}<start>}:{{@|#|+}<end>}

  Both ends of the range are optional, if neither are specified, the entire
  execution is traced.  Ranges that start with a `@' designate an address
  range to be traced, those that start with an `#' designate a cycle count
  range.  All other range values represent an instruction count range.  The
  second argument, if specified with a `+', indicates a value relative
  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may
  be used in all contexts.

    Examples:   -ptrace FOO.trc #0:#1000
                -ptrace BAR.trc @2000:
                -ptrace BLAH.trc :1500
                -ptrace UXXE.trc :
                -ptrace FOOBAR.trc @main:+278

  Branch predictor configuration examples for 2-level predictor:
    Configurations:   N, M, W, X
      N   # entries in first level (# of shift register(s))
      W   width of shift register(s)
      M   # entries in 2nd level (# of counters, or other FSM)
      X   (yes-1/no-0) xor history and address for 2nd level index
    Sample predictors:
      GAg     : 1, W, 2^W, 0
      GAp     : 1, W, M (M > 2^W), 0
      PAg     : N, W, 2^W, 0
      PAp     : N, W, M (M == 2^(N+W)), 0
      gshare  : 1, W, 2^W, 1
  Predictor `comb' combines a bimodal and a 2-level predictor.

  The cache config parameter <config> has the following format:

    <name>:<nsets>:<bsize>:<assoc>:<repl>

    <name>   - name of the cache being defined
    <nsets>  - number of sets in the cache
    <bsize>  - block size of the cache
    <assoc>  - associativity of the cache
    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random

    Examples:   -cache:dl1 dl1:4096:32:1:l
                -dtlb dtlb:128:4096:32:r

  Cache levels can be unified by pointing a level of the instruction cache
  hierarchy at the data cache hiearchy using the "dl1" and "dl2" cache
  configuration arguments.  Most sensible combinations are supported, e.g.,

    A unified l2 cache (il2 is pointed at dl2):
      -cache:il1 il1:128:64:1:l -cache:il2 dl2
      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l

    Or, a fully unified cache hierarchy (il1 pointed at dl1):
      -cache:il1 dl1
      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l



sim: ** starting performance simulation **

sim: ** simulation statistics **
sim_num_insn               27859692 # total number of instructions committed
sim_num_refs                8653319 # total number of loads and stores committed
sim_num_loads               7203672 # total number of loads committed
sim_num_stores         1449647.0000 # total number of stores committed
sim_num_branches             481706 # total number of branches committed
sim_elapsed_time                 22 # total simulation time in seconds
sim_inst_rate          1266349.6364 # simulation speed (in insts/sec)
sim_total_insn             27860763 # total number of instructions executed
sim_total_refs              8653600 # total number of loads and stores executed
sim_total_loads             7203830 # total number of loads executed
sim_total_stores       1449770.0000 # total number of stores executed
sim_total_branches           481843 # total number of branches executed
sim_cycle                  34876717 # total simulation time in cycles
sim_IPC                      0.7988 # instructions per cycle
sim_CPI                      1.2519 # cycles per instruction
sim_exec_BW                  0.7988 # total instructions (mis-spec + committed) per cycle
sim_IPB                     57.8355 # instruction per branch
IFQ_count                 139481575 # cumulative IFQ occupancy
IFQ_fcount                 34870153 # cumulative IFQ full count
ifq_occupancy                3.9993 # avg IFQ occupancy (insn's)
ifq_rate                     0.7988 # avg IFQ dispatch rate (insn/cycle)
ifq_latency                  5.0064 # avg IFQ occupant latency (cycle's)
ifq_full                     0.9998 # fraction of time (cycle's) IFQ was full
RUU_count                 557926666 # cumulative RUU occupancy
RUU_fcount                 34869540 # cumulative RUU full count
ruu_occupancy               15.9971 # avg RUU occupancy (insn's)
ruu_rate                     0.7988 # avg RUU dispatch rate (insn/cycle)
ruu_latency                 20.0255 # avg RUU occupant latency (cycle's)
ruu_full                     0.9998 # fraction of time (cycle's) RUU was full
LSQ_count                 169685679 # cumulative LSQ occupancy
LSQ_fcount                        0 # cumulative LSQ full count
lsq_occupancy                4.8653 # avg LSQ occupancy (insn's)
lsq_rate                     0.7988 # avg LSQ dispatch rate (insn/cycle)
lsq_latency                  6.0905 # avg LSQ occupant latency (cycle's)
lsq_full                     0.0000 # fraction of time (cycle's) LSQ was full
sim_slip                  764121248 # total number of slip cycles
avg_sim_slip                27.4275 # the average slip between issue and retirement
bpred_bimod.lookups          481883 # total number of bpred lookups
bpred_bimod.updates          481706 # total number of updates
bpred_bimod.addr_hits        481471 # total number of address-predicted hits
bpred_bimod.dir_hits         481589 # total number of direction-predicted hits (includes addr-hits)
bpred_bimod.misses              117 # total number of misses
bpred_bimod.jr_hits              72 # total number of address-predicted hits for JR's
bpred_bimod.jr_seen              81 # total number of JR's seen
bpred_bimod.jr_non_ras_hits.PP            0 # total number of address-predicted hits for non-RAS JR's
bpred_bimod.jr_non_ras_seen.PP            1 # total number of non-RAS JR's seen
bpred_bimod.bpred_addr_rate    0.9995 # branch address-prediction rate (i.e., addr-hits/updates)
bpred_bimod.bpred_dir_rate    0.9998 # branch direction-prediction rate (i.e., all-hits/updates)
bpred_bimod.bpred_jr_rate    0.8889 # JR address-prediction rate (i.e., JR addr-hits/JRs seen)
bpred_bimod.bpred_jr_non_ras_rate.PP    0.0000 # non-RAS JR addr-pred rate (ie, non-RAS JR hits/JRs seen)
bpred_bimod.retstack_pushes          118 # total number of address pushed onto ret-addr stack
bpred_bimod.retstack_pops           97 # total number of address popped off of ret-addr stack
bpred_bimod.used_ras.PP           80 # total number of RAS predictions used
bpred_bimod.ras_hits.PP           72 # total number of RAS hits
bpred_bimod.ras_rate.PP    0.9000 # RAS prediction rate (i.e., RAS hits/used RAS)
il1.accesses               27860787 # total number of accesses
il1.hits                   27860576 # total number of hits
il1.misses                      211 # total number of misses
il1.replacements                  0 # total number of replacements
il1.writebacks                    0 # total number of writebacks
il1.invalidations                 0 # total number of invalidations
il1.miss_rate                0.0000 # miss rate (i.e., misses/ref)
il1.repl_rate                0.0000 # replacement rate (i.e., repls/ref)
il1.wb_rate                  0.0000 # writeback rate (i.e., wrbks/ref)
il1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
dl1.accesses                8653398 # total number of accesses
dl1.hits                    6084921 # total number of hits
dl1.misses                  2568477 # total number of misses
dl1.replacements            2567453 # total number of replacements
dl1.writebacks               957274 # total number of writebacks
dl1.invalidations                 0 # total number of invalidations
dl1.miss_rate                0.2968 # miss rate (i.e., misses/ref)
dl1.repl_rate                0.2967 # replacement rate (i.e., repls/ref)
dl1.wb_rate                  0.1106 # writeback rate (i.e., wrbks/ref)
dl1.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
ul2.accesses                3525962 # total number of accesses
ul2.hits                    3457641 # total number of hits
ul2.misses                    68321 # total number of misses
ul2.replacements              60129 # total number of replacements
ul2.writebacks                20012 # total number of writebacks
ul2.invalidations                 0 # total number of invalidations
ul2.miss_rate                0.0194 # miss rate (i.e., misses/ref)
ul2.repl_rate                0.0171 # replacement rate (i.e., repls/ref)
ul2.wb_rate                  0.0057 # writeback rate (i.e., wrbks/ref)
ul2.inv_rate                 0.0000 # invalidation rate (i.e., invs/ref)
itlb.accesses              27860787 # total number of accesses
itlb.hits                  27860781 # total number of hits
itlb.misses                       6 # total number of misses
itlb.replacements                 0 # total number of replacements
itlb.writebacks                   0 # total number of writebacks
itlb.invalidations                0 # total number of invalidations
itlb.miss_rate               0.0000 # miss rate (i.e., misses/ref)
itlb.repl_rate               0.0000 # replacement rate (i.e., repls/ref)
itlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
itlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
dtlb.accesses               8653412 # total number of accesses
dtlb.hits                   8652339 # total number of hits
dtlb.misses                    1073 # total number of misses
dtlb.replacements               945 # total number of replacements
dtlb.writebacks                   0 # total number of writebacks
dtlb.invalidations                0 # total number of invalidations
dtlb.miss_rate               0.0001 # miss rate (i.e., misses/ref)
dtlb.repl_rate               0.0001 # replacement rate (i.e., repls/ref)
dtlb.wb_rate                 0.0000 # writeback rate (i.e., wrbks/ref)
dtlb.inv_rate                0.0000 # invalidation rate (i.e., invs/ref)
sim_invalid_addrs                 0 # total non-speculative bogus addresses seen (debug var)
ld_text_base             0x00400000 # program text (code) segment base
ld_text_size                  23472 # program text (code) size in bytes
ld_data_base             0x10000000 # program initialized data segment base
ld_data_size                   4096 # program init'ed `.data' and uninit'ed `.bss' size in bytes
ld_stack_base            0x7fffc000 # program stack segment base (highest address in stack)
ld_stack_size                 16384 # program initial stack size
ld_prog_entry            0x00400140 # program entry point (initial PC)
ld_environ_base          0x7fff8000 # program environment base address address
ld_target_big_endian              0 # target executable endian-ness, non-zero if big endian
mem.page_count                  370 # total number of pages allocated
mem.page_mem                  1480k # total size of memory pages allocated
mem.ptab_misses             2888570 # total first level page table misses
mem.ptab_accesses         152017728 # total page table accesses
mem.ptab_miss_rate           0.0190 # first level page table miss rate

